Skip to main content Accessibility help
×
Home
  • Print publication year: 2013
  • Online publication date: June 2014

1 - Multidimensional Data

from I - Classical Methods

Summary

Denken ist interessanter als Wissen, aber nicht als Anschauen (Johann Wolfgang von Goethe, Werke – Hamburger Ausgabe Bd. 12, Maximen und Reflexionen, 1749–1832). Thinking is more interesting than knowing, but not more interesting than looking at.

Multivariate and High-Dimensional Problems

Early in the twentieth century, scientists suchas Pearson (1901), Hotelling (1933) and Fisher (1936) developed methods for analysing multivariate data in order to

• understand the structure in the data and summarise it in simpler ways;

• understand the relationship of one part of the data to another part; and

• make decisions and inferences based on the data.

The early methods these scientists developed are linear; their conceptual simplicity and elegance still strike us today as natural and surprisingly powerful. Principal Component Analysis deals with the first topic in the preceding list, Canonical Correlation Analysis with the second and Discriminant Analysis with the third. As time moved on, more complex methods were developed, often arising in areas such as psychology, biology or economics, but these linear methods have not lost their appeal. Indeed, as we have become more able to collect and handle very large and high-dimensional data, renewed requirements for linear methods have arisen. In these data sets essential structure can often be obscured by noise, and it becomes vital to

reduce the original data in such a way that informative and interesting structure in the data is preserved while noisy, irrelevant or purely random variables, dimensions or features are removed, as these can adversely affect the analysis.