Book contents
- Frontmatter
- Dedication
- Contents
- List of Algorithms
- Notation
- Preface
- I Classical Methods
- 1 Multidimensional Data
- 2 Principal Component Analysis
- 3 Canonical Correlation Analysis
- 4 Discriminant Analysis
- Problems for Part I
- II Factors and Groupings
- III Non-Gaussian Analysis
- Problems for Part III
- References
- Author Index
- Subject Index
- Data Index
2 - Principal Component Analysis
from I - Classical Methods
Published online by Cambridge University Press: 05 June 2014
- Frontmatter
- Dedication
- Contents
- List of Algorithms
- Notation
- Preface
- I Classical Methods
- 1 Multidimensional Data
- 2 Principal Component Analysis
- 3 Canonical Correlation Analysis
- 4 Discriminant Analysis
- Problems for Part I
- II Factors and Groupings
- III Non-Gaussian Analysis
- Problems for Part III
- References
- Author Index
- Subject Index
- Data Index
Summary
Mathematics, rightly viewed, possesses not only truth, but supreme beauty (Bertrand Russell, Philosophical Essays No. 4, 1910).
Introduction
One of the aims in multivariate data analysis is to summarise the data in fewer than the original number of dimensions without losing essential information. More than a century ago, Pearson (1901) considered this problem, and Hotelling (1933) proposed a solution to it: instead of treating each variable separately, he considered combinations of the variables. Clearly, the average of all variables is such a combination, but many others exist. Two fundamental questions arise:
How should one choose these combinations?
How many such combinations should one choose?
There is no single strategy that always gives the right answer. This book will describe many ways of tackling at least the first problem.
Hotelling's proposal consisted in finding those linear combinations of the variables which best explain the variability of the data. Linear combinations are relatively easy to compute and interpret. Also, linear combinations have nice mathematical properties. Later methods, such as Multidimensional Scaling, broaden the types of combinations, but this is done at a cost: The mathematical treatment becomes more difficult, and the practical calculations will be more complex. The complexity increases with the size of the data, and it is one of the major reasons why Multidimensional Scaling has taken rather longer to regain popularity.
The second question is of a different nature, and its answer depends on the solution to the first.
- Type
- Chapter
- Information
- Analysis of Multivariate and High-Dimensional Data , pp. 18 - 69Publisher: Cambridge University PressPrint publication year: 2013