Velocity: Online Methods and Data Streams

Carlos Castillo

doi:10.1017/CBO9781316476840.007

One of the main reasons why social media is relevant for emergency response is because of its immediacy. For instance, the first reports on social media about the 2011 Utøya attacks in Norway appeared 12 minutes before the first news report in mainstream media (Perng et al., 2013), and in the 2013 Westgate mall attacks, social media reports appeared within a minute after the attack started, “scooping” mainstream media by more than half an hour. People on the ground can collect and disseminate time-critical information, as well as data for disaster reconnaissance that otherwise would be lost due to the gap between a disaster and their arrival on site (Dashti et al., 2014).

On a lighter note, it has been speculated, jokingly but plausibly, that the damaging seismic waves from an earthquake, traveling at a mere three to five kilometers per second, can be overtaken by social media messages about them, which propagate orders ofmagnitude faster through airwaves and optical fiber.

In this context, it is not surprising that people who associate social media with immediacy also expect a fast response from governments and other organizations, for instance, expecting help to arrive within a few hours of posting a message on social media (American Red Cross, 2012). Independently of whether those expectations are met or not in the near future, some capacity for rapid response to social media messages needs to be developed.

We recall from Section 1.5 that our main requirements are to create aggregate summaries about broad groups of messages (capturing the “big picture”), and to detect important events that require attention or action (offering “actionable insights”). We now add a new requirement: timeliness.

This chapter describes methods that ensure that the output summaries or insights are generated shortly after the input information required to create them becomes available. The way to achieve this low-latency or real-time data processing is to adapt a computing paradigm known as online processing, or equivalently, to consider that the input data is not a static object, but a continuously flowing data stream.

We begin by explaining how online processing differs from offline processing (§6.1), and present high-level operations on temporal data (§6.2). Then, we describe the framework of event detection (§6.3) and methods for finding events and subevents (§6.4). We also introduce the approach of incremental update summarization (§6.5), and end with a discussion of domain-specific approaches (§6.6).

Book contents

6 - Velocity: Online Methods and Data Streams

Summary

Access options

Book contents

6 - Velocity: Online Methods and Data Streams

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive