Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-8bljj Total loading time: 0 Render date: 2024-07-02T21:40:11.159Z Has data issue: false hasContentIssue false

2 - Introduction to stream processing

from Part 1 - Fundamentals

Published online by Cambridge University Press:  05 March 2014

Henrique C. M. Andrade
Affiliation:
J. P. Morgan
Buğra Gedik
Affiliation:
Bilkent University, Ankara
Deepak S. Turaga
Affiliation:
IBM Thomas J. Watson Research Center, New York
Get access

Summary

Overview

Stream processing has been a very active and diverse area of research, commercial, and open-source development. This diversity of technologies has brought along new terminology, concepts, and infrastructure necessary for designing and implementing sophisticated applications.

We start this chapter by describing some of the application domains where stream processing technologies have been successfully employed (Section 2.2), focusing on the distinctive characteristics of these applications that make them suitable for the use of stream processing technologies.

These application scenarios allow us to illustrate the motivating requirements that led to the development of multiple information flow processing systems, a class that groups multiple technical approaches to continuous data processing (Section 2.3). We discuss some of its broad subcategories, including active databases, Continuous Query (CQ) systems, publish–subscribe systems, and Complex Event Processing (CEP) systems. All of them are precursors that have helped shape the stream processing paradigm.

We then switch the focus to the conceptual foundations and the architectural support behind the stream processing technology, and the applications it supports (Section 2.4). We also include an overview of analytics, i.e., the algorithms and knowledge discovery techniques, that form the basis of most innovative Stream Processing Applications (SPAs) and a historical perspective on the research that led to the development of several Stream Processing Systems (SPSs).

Finally we include a survey of academic, open-source, and commercially available SPS implementations, and describe their different characteristics.

Type
Chapter
Information
Fundamentals of Stream Processing
Application Design, Systems, and Analytics
, pp. 33 - 74
Publisher: Cambridge University Press
Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Andress, J, Winterfeld, S. Cyber Warfare: Techniques, Tactics and Tools for Security Practitioners. Syngress; 2011.Google Scholar
[2] Worm Infects Millions of Computers Worldwide; published January 2009; retrieved in July, 2011. http://www.nytimes.com/2009/01/23/technology/internet/23worm.html.
[3] Amarasingham, R, Pronovost, PJ, Diener-West, M, Goeschel, C, Dorman, T, Thiemann, DR, et al.Measuring clinical information technology in the ICU setting: application in a quality improvement collaborative. Journal of American Medical Informatics Association (JAMIA). 2007;14(3):288–294.Google Scholar
[4] Blount, M, Ebling, M, Eklund, M, James, A, McGregor, C, Percival, N, et al.Real-time analysis for intensive care: development and deployment of the Artemis analytic system. IEEE Engineering in Medicine and Biology Magazine. 2010;29(2):110–118.CrossRefGoogle ScholarPubMed
[5] Cugola, G, Margara, A. Processing flows of information: from data stream to complex event processing. ACM Computing Surveys. 2012;44(3):15.CrossRefGoogle Scholar
[6] Lieuwen, D, Gehani, N, Arlein, R. The Ode active database: trigger semantics and implementation. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE). Washington, DC; 1996. pp. 412–420.Google Scholar
[7] Dayal, U, Blaustein, B, Buchmann, A, Chakravarthy, U, Hsu, M, Ledin, R, et al.The HiPAC project: combining active databases and timing constraints. ACM SIGMOD Record. 1988;17(1):51–70.Google Scholar
[8] Gatziu, S, Dittrich, K. Events in an active object-oriented database system. In: Proceedings of the International Workshop on Rules in Database Systems (RIDS). Edinburgh, UK; 1993. p. 23–39.Google Scholar
[9] Chakravarthy, S, Mishra, D. Snoop: an expressive event specification language for active databases. Elsevier Data & Knowledge Engineering. 1994;14(1):1–26.Google Scholar
[10] Act-Net Consortium. The active database management system manifesto: a rulebase of ADBMS features. ACM SIGMOD Record. 1996;25(3):40–49.
[11] Terry, D, Goldberg, D, Nichols, D, Oki, B. Continuous queries over append-only databases. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). San Diego, CA; 1992. pp. 321–330.Google Scholar
[12] Chen, J, DeWitt, DJ, Tian, F, Wang, Y. NiagaraCQ: a scalable continuous query system for Internet databases. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). Dallas, TX; 2000. pp. 379–390.Google Scholar
[13] Liu, L, Pu, C, Tang, W. Continual queries for Internet scale event-driven information delivery. IEEE Transactions on Data and Knowledge Engineering (TKDE). 1999;11(4): 610–628.Google Scholar
[14] Strom, RE, Banavar, G, Chandra, TD, Kaplan, M, Miller, K, Mukherjee, B, et al.Gryphon: An Information Flow Based Approach to Message Brokering. The Computing Research Repository (CoRR); 1998. cs.DC/9810019.Google Scholar
[15] Carzaniga, A, Rosenblum, DS, Wolf, AL. Design and evaluation of a wide-area event notification service. ACM Transactions on Computer Systems. 2001;19(3):332–383.CrossRef
[16] Jacobsen, HA, Cheung, AKY, Li, G, Maniymaran, B, Muthusamy, V, Kazemzadeh, RS. The PADRES publish/subscribe system. In: Hinze, A, Buchmann, AP, editors. Principles and Applications of Distributed Event-Based Systems. IGI Global; 2010. pp. 164–205.Google Scholar
[17] Muhl, G, Fiege, L, Pietzuch, P. Distributed Event-Based Systems. Springer; 2006.Google Scholar
[18] Oracle, CEP; retrieved in October 2012. http://www.oracle.com/technetwork/middleware/complex-event-processing/overview/.
[19] TIBCO BusinessEvents; retrieved in October 2012. http://www.tibco.com/products/event-processing/complex-event-processing/businessevents/.
[20] IBM Websphere Business Events; retrieved in April 2011. http://www-01.ibm.com/software/integration/wbe/.
[21] Team, E, Inc, E. Esper Reference (Version 4.7.0). EsperTech Inc.; 2012.Google Scholar
[22] Brenna, L, Demers, A, Gehrke, J, Hong, M, Ossher, J, Panda, B, et al.Cayuga: a highperformance event processing engine. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). Beijing, China; 2007. pp. 1100–1102.Google Scholar
[23] Wu, E, Diao, Y, Rizvi, S. High-performance complex event processing over streams. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). Chicago, IL; 2006. pp. 407–418.Google Scholar
[24] Turaga, D, Verscheure, O, Sow, D, Amini, L. Adaptative signal sampling and sample quantization for resource-constrained stream processing. In: Proceedings of the International Conference on Biomedical Electronics and Devices (BIOSIGNALS). Funchal, Madeira, Portugal; 2008. pp. 96–103.Google Scholar
[25] Biem, A, Elmegreen, B, Verscheure, O, Turaga, D, Andrade, H, Cornwell, T. A streaming approach to radio astronomy imaging. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Dallas, TX; 2010. pp. 1654–1657.Google Scholar
[26] Bouillet, E, Kothari, R, Kumar, V, Mignet, L, Nathan, S, Ranganathan, A, et al.Processing 6 billion CDRs/day: from research to production. In: Proceedings of the ACM International Conference on Distributed Event Based Systems (DEBS). Berlin, Germany; 2012. pp. 264–267.Google Scholar
[27] Turaga, DS, Verscheure, O, Wong, J, Amini, L, Yocum, G, Begle, E, et al.Online FDC control limit tuning with yield prediction using incremental decision tree learning. In: 2007 Semat-ech Advanced Equipment Control / Advanced Process Control Symposium (AEC/APC). Indian Wells, CA; 2007. pp. 53–54.Google Scholar
[28] Park, Y, King, R, Nathan, S, Most, W, Andrade, H. Evaluation of a high-volume, low-latency market data processing sytem implemented with IBM middleware. Software: Practice & Experience. 2012;42(1):37–56.Google Scholar
[29] Hersent, O. IP Telephony – Deploying VoIP Protocols and IMS Infrastructure. 2nd edn. John Wiley & Sons, Inc.; 2011.Google Scholar
[30] ITU. G.729: Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP). International Telecommunication Union (ITU); 2007. G.729.
[31] Eberspächer, J, Vögel, HJ, Bettstetter, C, Hartmann, C. GSM – Architecture, Protocols and Services. 3rd edn. John Wiley & Sons, Inc.; 2009.Google Scholar
[32] NIST. Advanced Encryption Standard (AES). National Institute of Standards and Technology (NIST); 2001. FIPS-PUB-197.
[33] Schulzrinne, H, Casner, S, Frederick, R, Jacobson, V. RTP: A Transport Protocol for RealTime Applications. The Internet Engineering Task Force (IETF); 1996. RFC 1889.Google Scholar
[34] Verscheure, O, Vlachos, M, Anagnostopoulos, A, Frossard, P, Bouillet, E, Yu, PS. Finding “who is talking to whom” in VoIP networks via progressive stream clustering. In: Proceedings of the IEEE International Conference on Data Mining (ICDM). Hong Kong, China; 2006. pp. 667–677.Google Scholar
[35] Knoke, D, Yang, S. Social Network Analysis. 2nd edn. Sage Publications; 2008.CrossRefGoogle Scholar
[36] Ullman, J. Principles of Database and Knowledge-Base Systems. Computer Science Press; 1988.Google Scholar
[37] StreamSQL; retrieved in October 2012. http://www.streambase.com/developers/docs/latest/streamsql/.
[38] Arasu, A, Babu, S, Widom, J. The CQL continuous query language: semantic foundations and query execution. Very Large Databases Journal (VLDBJ). 2006;15(2):121–142.Google Scholar
[39] Thies, W, Karczmarek, M, Amarasinghe, S. StreamIt: a language for streaming applications. In: Proceedings of the International Conference on Compiler Construction (CC). Grenoble, France; 2002. pp. 179–196.Google Scholar
[40] Abadi, D, Carney, D, Çetintemel, U, Cherniack, M, Convey, C, Lee, S, et al.Aurora: a new model and architecture for data stream management. Very Large Databases Journal (VLDBJ). 2003;12(2):120–139.Google Scholar
[41] Gedik, B, Andrade, H, Frenkiel, A, De Pauw, W, Pfeifer, M, Allen, P, et al.Debugging tools and strategies for distributed stream processing applications. Software: Practice & Experience. 2009;39(16):1347–1376.Google Scholar
[42] Reyes, JC. A Graph Editing Framework for the StreamIt Language [Masters Thesis]. Massachusetts Institute of Technology; 2004.Google Scholar
[43] Tanenbaum, A, Wetherall, D. Computer Networks. 5th edn. Prentice Hall; 2011.Google Scholar
[44] Chandrasekaran, S, Cooper, O, Deshpande, A, Franklin, M, Hellerstein, J, Hong, W, et al.TelegraphCQ: continuous dataflow processing. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). San Diego, CA; 2003. pp. 329–338.Google Scholar
[45] Shah, MA, Madden, S, Franklin, MJ, Hellerstein, JM. Java support for data-intensive systems: experiences building the Telegraph dataflow system. ACM SIGMOD Record. 2001;30(4):103–114.CrossRefGoogle Scholar
[46] PostgreSQL; retrieved in July 2011. http://www.postgresql.org/.
[47] Chandrasekaran, S, Franklin, M. Remembrance of streams past: overload-sensitive management of archived streams. In: Proceedings of the International Conference on Very Large Databases (VLDB). Toronto, Canada; 2004. pp. 348–359.Google Scholar
[48] Arasu, A, Babcock, B, Babu, S, Datar, M, Ito, K, Motwani, R, et al.STREAM: the Stanford Stream data manager. IEEE Data Engineering Bulletin. 2003;26(1):665.Google Scholar
[49] Carney, D, Çetintemel, U, Cherniack, M, Convey, C, Lee, S, Seidman, G, et al.Monitoring streams – a new class of data management applications. In: Proceedings of the International Conference on Very Large Databases (VLDB). Hong Kong, China; 2002. pp. 215–226.Google Scholar
[50] Tatbul, N, Çetintemel, U, Zdonik, SB. Staying FIT: efficient load shedding techniques for distributed stream processing. In: Proceedings of the International Conference on Very Large Databases (VLDB). Vienna, Austria; 2007. pp. 159–170.Google Scholar
[51] Cherniack, M, Balakrishnan, H, Balazinska, M, Carney, D, Çetintemel, U, Xing, Y, et al.Scalable distributed stream processing. In: Proceedings of the Innovative Data Systems Research Conference (CIDR). Asilomar, CA, USA; 2003. pp. 257–268.Google Scholar
[52] Zdonik, SB, Stonebraker, M, Cherniack, M, Çetintemel, U, Balazinska, M, Balakrishnan, H. The Aurora and Medusa projects. IEEE Data Engineering Bulletin. 2003;26(1):3–10.Google Scholar
[53] Abadi, D, Ahmad, Y, Balazinska, M, Çetintemel, U, Cherniack, M, Hwang, JH, et al.The design of the Borealis stream processing engine. In: Proceedings of the Innovative Data Systems Research Conference (CIDR). Asilomar, CA; 2005. pp. 277–289.Google Scholar
[54] Balakrishnan, H, Balazinska, M, Carney, D, Çetintemel, U, Cherniack, M, Convey, C, et al.Retrospective on Aurora. Very Large Databases Journal (VLDBJ). 2004;13(4):370–383.Google Scholar
[55] Arasu, A, Cherniak, M, Galvez, E, Maier, D, Maskey, A, Ryvkina, E, et al.Linear Road: a stream data management benchmark. In: Proceedings of the International Conference on Very Large Databases (VLDB). Toronto, Canada; 2004. pp. 480–491.Google Scholar
[56] Jain, N, Amini, L, Andrade, H, King, R, Park, Y, Selo, P, et al.Design, implementation, and evaluation of the Linear Road benchmark on the stream processing core. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). Chicago, IL; 2006. pp. 431–442.Google Scholar
[57] Zeitler, E, Risch, T. Massive scale-out of expensive continuous queries. Proceedings of the VLDB Endowment. 2011;4(11):1181–1188.Google Scholar
[58] Cranor, C, Johnson, T, Spataschek, O, Shkapenyuk, V. Gigascope: a stream database for network applications. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). San Diego, CA; 2003. pp. 647–651.Google Scholar
[59] Bai, Y, Thakkar, H, Wang, H, Luo, C, Zaniolo, C. A data stream language and system designed for power and extensibility. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM). Arlington, VA; 2006. pp. 337–346.Google Scholar
[60] Kräer, J, Seeger, B. PIPES: a public infrastructure for processing and exploring streams. In: Proceedings of the ACM International Conference on Management of Data (SIGMOD). Paris, France; 2004. pp. 925–926.Google Scholar
[61] Cammert, M, Heinz, C, Krämer, J, Schneider, M, Seeger, B. A status report on XXL – a software infrastructure for efficient query processing. IEEE Data Engineering Bulletin. 2003;26(2):12–18.Google Scholar
[62] Naughton, JF, DeWitt, DJ, Maier, D, Aboulnaga, A, Chen, J, Galanis, L, et al.The Niagara Internet query system. IEEE Data Engineering Bulletin. 2001;24(2):27–33.Google Scholar
[63] Aberer, K, Hauswirth, M, Salehi, A. A middleware for fast and flexible sensor network deployment. In: Proceedings of the International Conference on Very Large Databases (VLDB). Seoul, Korea; 2006. pp. 1199–1202.Google Scholar
[64] The GSN Project; retrieved in October 2012. http://sourceforge.net/apps/trac/gsn/.
[65] Sybase ESP; retrieved in October 2012. http://www.sybase.com/products/financialservicessolutions/complex-event-processing/.
[66] Sybase. Sybase Event Stream Processor – Programmers Reference (Version 5.1). Sybase, an SAP Company; 2012.
[67] TIBCO. TIBCO BusinessEvents – Event Stream Processing – Query Developer's Guide (Release 5.0). TIBCO; 2011.
[68] TIBCO. TIBCO BusinessEvents – Event Stream Processing – Pattern Matcher Developer's Guide (Release 5.0). TIBCO; 2011.
[69] Oracle. Oracle Complex Event Processing: Lightweight Modular Application Event Stream Processing in the Real World. Oracle white paper; 2009.
[70] Oracle CQL Language Reference 11g Release 1 (11.1.1); retrieved in October 2012. http://docs.oracle.com/cd/E15523_01/doc.1111/e12048.pdf.
[71] ISO. Information Technology – Database Languages – SQL. International Organization for Standardization (ISO); 1999. ISO/IEC 9075-[1-5]:1999.
[72] Andersen, L. JDBC API Specification 4.1 (Maintenance Release). Oracle; 2011. JSR-000221.Google Scholar
[73] StreamBase Systems; retrieved in April 2011. http://www.streambase.com/.
[74] StreamBase EventFlow; retrieved in October 2012. http://www.streambase.com/developers/docs/latest/authoring/.
[75] StreamBase API Guide; retrieved in October 2012. http://www.streambase.com/developers/docs/latest/apiguide/.
[76] StreamBase Test/Debug Guide; retrieved in October 2012. http://www.streambase.com/developers/docs/latest/testdebug/.
[77] StreamBase Administration Guide; retrieved in October 2012. http://www.streambase.com/developers/docs/latest/admin/.
[78] Deakin, N. Java Message Service – Version 2.0 (Early Draft). Oracle; 2012.Google Scholar
[79] Clayberg, E, Rubel, D. Eclipse Plug-ins. 3rd edn. Addison Wesley; 2008.Google Scholar
[80] StreamBase LiveView; retrieved in October 2012. http://www.streambase.com/products/liveview/.
[81] Neumeyer, L, Robbins, B, Nair, A, Kesari, A. S4: distributed stream computing platform. In: Proceedings of the International Workshop on Knowledge Discovery Using Cloud and Distributed Computing Platforms (KDDCloud). Sydney, Australia; 2010. pp. 170–177.Google Scholar
[82] S4 – Distributed Stream Computing Platform; retrieved in October 2012. http://incubator.apache.org/s4/.
[83] Dean, J, Ghemawat, S. MapReduce: simplified data processing on large clusters. In: Proceedings of the USENIX Symposium on Operating System Design and Implementation (OSDI). San Francisco, CA; 2004. p. 137–150.Google Scholar
[84] Amini, L, Andrade, H, Bhagwan, R, Eskesen, F, King, R, Selo, P, et al.SPC: a distributed, scalable platform for data mining. In: Proceedings of the Workshop on Data Mining Standards, Services and Platforms (DM-SSP). Philadelphia, PA; 2006. pp. 27–37.Google Scholar
[85] Storm Tutorial; retrieved in October 2012. https://github.com/nathanmarz/storm/wiki/Tutorial/.
[86] The Storm Project; retrieved in October 2012. http://storm-project.net/.
[87] Twitter; retrieved in March, 2011. http://www.twitter.com/.

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×