Principal component analysis (PCA) is used for dimension reduction and data summary. However, principal components (PCs) cannot be easily interpreted. To interpret PCs, this study compares two methods to approximate PCs. One uses the PCA loadings to understand how input variables are projected to PCs. The other uses forward-stepwise regression to determine the proportions of PC variances explained by input variables.
Two data sets derived from the Canadian Health Measures Survey (CHMS) were used to test the concept of PC approximation: a spirometry subset with the measures from the first trial of spirometry; and, full data set that contained representative variables. Variables were centered and scaled. PCA were conducted with 282 and twenty-three variables respectively. PCs were approximated with two methods.
The first PC (PC1) could explain 12.1 percent and 50.3 percent of total variances in respective data sets. The leading variables explained 89.6 percent and 79.0 percent of the variances of PC1 in respective data sets. It required one and two variables to explain more than 80 percent of the variances of PC1, respectively. Measures related to physical development were the leading variables to approximate PC1 and lung function variables were leading to approximate PC2 in the full data set. The leading variable to approximate PC1 of the spirometry subset were forced expiratory volume (FEV) 0.5/forced vital capacity (FVC) (percent) and FEV1/FVC (percent).
Approximating PCs with input variables were highly feasible and helpful for the interpretation of PCs, especially for the first PCs. This method is also useful to identify major or unique sources of variances in data sets. The variables related to physical development are the variables related to the most variations in the full data set. The leading variable in the spirometry subset, FEV0.5/FVC (percent), is not well studied for its application in clinical use.