In this chapter we establish the mathematical notation used throughout this book and introduce the basic foundation of machine learning that this text builds upon. Readers generally familiar with this field can cursorily read this chapter to become familiar with our notation. For a more thorough treatment of machine learning, the reader should refer to a text such as (Hastie, Tibshirani, & Friedman 2003) or (Vapnik 1995).
Here we give a brief overview of the formal notation we use throughout this text. For more, along with foundations in basic logic, set theory, linear algebra, mathematical optimization, and probability we refer the reader to Appendix A.
We use = to denote equality and _ to denote defined as. The typeface style of a character is used to differentiate between elements of a set, sets, and spaces as follows. Individual objects such as scalars are denoted with italic font (e.g., x) and multidimensional vectors are denoted with bold font (e.g., x). A set is denoted using blackboard bold characters (e.g., X). However, when referring to the entire set or universe that spans a particular kind of object (i.e., a space), we use calligraphic script such as in X to distinguish it from subsets X contained within this space.
Statistical Machine Learning
Machine learning encompasses a vast field of techniques that extract information from data as well as the theory and analysis relating to these algorithms. In describing the task of machine learning, Mitchell (1997) wrote,
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
This definition encompasses a broad class of methods. We present an overview of the terminology and mechanisms for a particular notion of learning that is often referred to as statistical machine learning. In particular, the notion of experience is cast as data, the task is to choose an action (or make a prediction/decision) from an action or
Figure 2.1Diagrams depicting the flow of information through different phases of learning. (a)All major phases of the learning algorithm except for model selection.