Book contents
- Frontmatter
- Dedication
- Contents
- Preface
- Foreword
- Acknowledgements
- List of acronyms
- Part 1 Fundamentals
- Part II Application development
- Part III System architecture
- Part IV Application design and analytics
- 9 Design principles and patterns for stream processing applications
- 10 Stream analytics: data pre-processing and transformation
- 11 Stream analytics: modeling and evaluation
- Part V Case studies
- Part VI Closing notes
- Keywords and identifiers index
- Index
- References
11 - Stream analytics: modeling and evaluation
from Part IV - Application design and analytics
Published online by Cambridge University Press: 05 March 2014
- Frontmatter
- Dedication
- Contents
- Preface
- Foreword
- Acknowledgements
- List of acronyms
- Part 1 Fundamentals
- Part II Application development
- Part III System architecture
- Part IV Application design and analytics
- 9 Design principles and patterns for stream processing applications
- 10 Stream analytics: data pre-processing and transformation
- 11 Stream analytics: modeling and evaluation
- Part V Case studies
- Part VI Closing notes
- Keywords and identifiers index
- Index
- References
Summary
Overview
In this chapter we focus on the last two stages of the data mining process and examine techniques for modeling and evaluation. In many ways, these steps form the core of a mining task where automatic or semi-automatic analysis of streaming data is used to extract insights and actionable models. This process employs algorithms specially designed for different purposes, such as the identification of similar groups of data, of unusual groups of data, or of related data, whose associations were previously unknown.
This chapter starts with a description of a methodology for offline modeling, where the model for a dataset is initially learned, and online evaluation, where this model is used to analyze the new data being processed by an application (Section 11.2). Despite the use of algorithms where a model is learned offline from previously stored training data, this methodology is frequently used in SPAs because it can leverage many of the existing data mining algorithms devised for analyzing datasets stored in databases and data warehouses.
Offline modeling is often sufficient for the analytical goals of many SPAs. Nevertheless, the use of online modeling and evaluation techniques allows a SPA to function autonomically and to evolve as a result of changes in the workload and in the data. Needless to say this is the goal envisioned by proponents of stream processing. Thus, in the rest of the chapter, we examine in detail online techniques for modeling or mining data streams.
- Type
- Chapter
- Information
- Fundamentals of Stream ProcessingApplication Design, Systems, and Analytics, pp. 388 - 438Publisher: Cambridge University PressPrint publication year: 2014