Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-nmvwc Total loading time: 0 Render date: 2024-07-02T21:43:45.805Z Has data issue: false hasContentIssue false

9 - Design principles and patterns for stream processing applications

from Part IV - Application design and analytics

Published online by Cambridge University Press:  05 March 2014

Henrique C. M. Andrade
Affiliation:
J. P. Morgan
Buğra Gedik
Affiliation:
Bilkent University, Ankara
Deepak S. Turaga
Affiliation:
IBM Thomas J. Watson Research Center, New York
Get access

Summary

Overview

In the preceding chapters we described the stream processing programming model and the system architecture that supports it. In this chapter we will describe the principles of stream processing application design, and provide patterns to illustrate effective and efficient ways in which these principles can be put into practice.

We look at look at functional design patterns [1] and principles that describe effective ways to accomplish stream processing tasks, as well as non-functional ones [1] that address cross-cutting concerns such as scalability, performance, and fault tolerance.

This chapter is organized as follows. Section 9.2 describes functional design patterns and principles, covering the topics of edge adaptation, flow manipulation, and dynamic adaptation. Section 9.3 describes non-functional design patterns and principles, covering the topics of application composition, parallelization, optimization, and fault tolerance.

Functional design patterns and principles

We start by examining functional design patterns and principles, covering edge adaptation, flow manipulation, and dynamic adaptation.

9.2.1 Edge adaptation

SPAs consume data from external sources available in various different formats and accessible by employing different protocols. Similarly, results produced by streaming applications are often consumed by external systems in various formats and through different protocols. We term the process of interacting with external systems to receive and send data as edge adaptation.

Type
Chapter
Information
Fundamentals of Stream Processing
Application Design, Systems, and Analytics
, pp. 275 - 341
Publisher: Cambridge University Press
Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Wiegers, KE. Software Requirements. Microsoft Press; 2003.Google Scholar
[2] Brownell, D. SAX2. O'Reilly Media; 2002.Google Scholar
[3] The Object Management Group (OMG), Corba; retrieved in September 2010. http://www.corba.org/.
[4] Protocol Buffers – Google's data interchange format; retrieved in August 2011. http://code.google.com/p/protobuf/.
[5] Apache Thrift; retrieved in August 2011. http://thrift.apache.org/.
[6] Elmasri, R, Navathe, S. Fundamentals of Database Systems. Addison Wesley; 2000.Google Scholar
[7] IBM InfoSphere Streams Version 3.0 Information Center; retrieved in June 2011. http://publib.boulder.ibm.com/infocenter/streams/v3r0/index.jsp.
[8] ISO. Information Technology – Database Languages – SQL – Part 3: Call-Level Interface (SQL/CLI). International Organization for Standardization (ISO); 2008. ISO/IEC 9075-3.
[9] Park, Y, King, R, Nathan, S, Most, W, Andrade, H. Evaluation of a high-volume, low-latency market data processing sytem implemented with IBM middleware. Software: Practice & Experience. 2012;42(1):37–56.
[10] Tanenbaum, A, Wetherall, D. Computer Networks. 5th edn. Prentice Hall; 2011.Google Scholar
[11] Babcock, B, Datar, M, Motwani, R. Load shedding in data stream systems. In: Aggarwal, C, editor. Data Streams: Models and Algorithms. Springer; 2007. pp. 127–146.Google Scholar
[12] Tatbul, N, Çetintemel, U, Zdonik, SB, Cherniack, M, Stonebraker, M. Load shedding in a data stream manager. In: Proceedings of the International Conference on Very Large Databases (VLDB). Berlin, Germany; 2003. pp. 309–320.Google Scholar
[13] Tatbul, N, Çetintemel, U, Zdonik, SB. Staying FIT: efficient load shedding techniques for distributed stream processing. In: Proceedings of the International Conference on Very Large Databases (VLDB). Vienna, Austria; 2007. pp. 159–170.Google Scholar
[14] Chi, Y, Yu, PS, Wang, H, Muntz, RR. LoadStar: a load shedding scheme for classifying data streams. In: Proceedings of the SIAM Conference on Data Mining (SDM). Newport Beach, CA; 2005. pp. 346–357.Google Scholar
[15] Gedik, B, Wu, KL, Yu, PS. Efficient construction of compact source filters for adaptive load shedding in data stream processing. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE). Cancun, Mexico; 2008. pp. 396–405.Google Scholar
[16] Gedik, B, Wu, KL, Yu, PS, Liu, L. GrubJoin: an adaptive, multi-way, windowed stream join with time correlation-aware CPU load shedding. IEEE Transactions on Data and Knowledge Engineering (TKDE). 2007;19(10):1363–1380.CrossRefGoogle Scholar
[17] Molloy, M. Fundamentals of Performance Modeling. Prentice Hall; 1998.Google Scholar
[18] Fallside, DC, Walmsley, P. XML Schema Part 0: Primer – Second Edition. World Wide Web Consortium (W3C); 2004. http://www.w3.org/TR/xmlschema-0/.Google Scholar
[19] ISO. Information Processing – Text and Office Systems – Standard Generalized Markup Language (SGML). International Organization for Standardization (ISO); 1986. ISO 8879.
[20] Booth, D, Haas, H, McCabe, F, Newcomer, E, Champion, M, Ferris, C, et al.Web Services Architecture – W3C Working Group Note. World Wide Web Consortium (W3C); 2004. http://www.w3.org/TR/ws-arch/.Google Scholar
[21] Pemberton, S. XHTML 1.0 The Extensible HyperText Markup Language (Second Edition). World Wide Web Consortium (W3C); 2002. http://www.w3.org/TR/xhtm11/.Google Scholar
[22] Cadenhead, R, RSS Board. RSS 2.0 Specification. RSS Advisory Board; 2009. http://www.rssboard.org/rss-specification.Google Scholar
[23] Nottingham, M, Sayre, R. The Atom Syndication Format. The Internet Engineering Task Force (IETF); 2005. RFC 4287.CrossRefGoogle Scholar
[24] Clark, J, DeRose, S. XML Path Language (XPath) Version 1.0. World Wide Web Consortium (W3C); 1999. http://www.w3.org/TR/xpath/.Google Scholar
[25] Hégaret, PL. Document Object Model (DOM). World Wide Web Consortium (W3C); 2008. http://www.w3.org/DOM/.Google Scholar
[26] Vitter, JS. Random sampling with a reservoir. ACM Transactions on Mathematical Software (TOMS). 1985;11(1):37–57.CrossRefGoogle Scholar
[27] Cormen, TH, Leiserson, CE, Rivest, RL. Introduction to Algorithms. MIT Press and McGraw Hill; 1990.Google Scholar
[28] Twitter; retrieved in March 2011. http://www.twitter.com/.
[29] Bouillet, E, Feblowitz, M, Feng, H, Ranganathan, A, Riabov, A, Udrea, O, et al.MARIO: middleware for assembly and deployment of multi-platform flow-based applications. In: Proceedings of the ACM/IFIP/USENIX International Middleware Conference (Middleware). Urbana, IL; 2009. p. 26.Google Scholar
[30] Jacques-Silva, G, Gedik, B, Wagle, R, Wu, KL, Kumar, V. Building user-defined runtime adaptation routines for stream processing applications. Proceedings of the VLDB Endowment. 2012;5(12):1826–1837.CrossRefGoogle Scholar
[31] Hennessy, JL, Patterson, DA. Computer Architecture: A Quantitative Approach. 2nd edn. Morgan Kaufmann; 1996.Google Scholar
[32] Marr, DT, Binns, F, Hill, DL, Hinton, G, Koufaty, DA, Miller, AJ, et al.Hyper-threading technology architecture and microarchitecture. Intel Technology Journal. 2002;6(1):4–15.Google Scholar
[33] Andrade, H, Gedik, B, Wu, KL, Yu, PS. Processing high data rate streams in System S. Journal of Parallel and Distributed Computing (JPDC). 2011;71(2):145–156.CrossRefGoogle Scholar
[34] Amdahl, G. Validity of the single processor approach to achieving large-scale computing capabilities. In: Proceedings of the American Federation of Information Processing Societies Conference (AFIPS). Anaheim, CA; 1967. pp. 483–485.Google Scholar
[35] Molina, HG, Ullman, JD, Widom, J. Database Systems: The Complete Book. Prentice Hall; 2008.Google Scholar
[36] Zhang, X, Andrade, H, Gedik, B, King, R, Morar, J, Nathan, S, et al.Implementing a highvolume, low-latency market data processing system on commodity hardware using IBM middleware. In: Proceedings of the Workshop on High Performance Computational Finance (WHPCF). Portland, OR; 2009. article no. 7.Google Scholar
[37] Jacques-Silva, G, Gedik, B, Andrade, H, Wu, KL. Language-level checkpointing support for stream processing applications. In: Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Lisbon, Portugal; 2009. pp. 145–154.Google Scholar
[38] Jacques-Silva, G, Gedik, B, Andrade, H, Wu, KL. Fault-injection based assessment of partial fault tolerance in stream processing applications. In: Proceedings of the ACM International Conference on Distributed Event Based Systems (DEBS). New York, NY; 2011. pp. 231–242.Google Scholar
[39] Jacques-Silva, G, Kalbarczyk, Z, Gedik, B, Andrade, H, Wu, KL, Iyer, RK. Modeling stream processing applications for dependability evaluation. In: Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). Hong Kong, China; 2011. pp. 430–441.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×