There is no sense in being precise when you don't even know what you're talking about (John von Neumann, 1903–1957).
Cluster Analysis is an exploratory technique which partitions observations into different clusters or groupings. In medicine, biology, psychology, marketing or finance, multivariate measurements of objects or individuals are the data of interest. In biology, human blood cells of one or more individuals – such as the HIV flow cytometry data – might be the objects one wants to analyse. Cells with similar multivariate responses are grouped together, and cells whose responses differ considerably from each other are partitioned into different clusters. The analysis of cells from a number of individuals such as HIV+ and HIV− individuals may result in different cluster patterns. These differences are informative for the biologist and might allow him or her to draw conclusions about the onset or progression of a disease or a patient's response to treatment.
Clustering techniques are applicable whenever a mountain of data needs to be grouped into manageable and meaningful piles. In some applications we know that the data naturally fall into two groups, such as HIV+ or HIV−, but in many cases the number of clusters is not known. The goal of Cluster Analysis is to determine
• the cluster allocation for each observation, and
• the number of clusters.
For some clustering methods – such as k-means – the user has to specify the number of clusters prior to applying the method.