To send content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .
To send content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about sending to your Kindle.
Note you can select to send to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Index mining is a new discipline that aims to search for the composite measures or indices most relevant to the contexts or outcomes. After reviewing three frailty indices and principal component (PC)-based indices, we hereby show certain occasions that can lead to ineffective indices, which consist of bias or fail to represent the theories.
We reproduced and reviewed the three frailty indices and the 134,689 PC (principal component) -based indices from previous publications. The impact of aggregating the input variables on the final indices was analyzed using forward stepwise regression.
Several methods to combine the input variables were related to ineffective projection of information onto the indices. The most common causes leading to ineffective summation of input variables were shown in three examples involving different types of input variables, which were positively or negatively correlated or uncorrelated to the outcome. Ineffective indices were created often because of the summation of redundant information or uncorrelated variables.
The creation of ineffective indices can be avoided if the relationships between input variables and outcomes are properly scrutinized. The creation of composite measures and indices is still a discipline under active development. The three examples we identified are the mistakes that may be repeated unintentionally and need to be addressed with explicit rules. A reporting guide for the creation of composite measures has been proposed. A proper review of index objectives, data characteristics, and data limitations before creating composite measures or indices is recommended.
Principal component analysis (PCA) is important to summarize data or reduce dimensionality. However, one disadvantage of using PCA is the interpretability of the principal components (PCs), especially in a high-dimensional database. This study aims to analyze the patterns of variance accumulation according to PCA loadings and to approximate PCs with input variables from sample data sets.
There were three data sets of various sizes used to understand the performance of PC approximation: Hitters; SF-12v2 subset of the 2004 to 2011 Medical Expenditure Panel Survey (MEPS); and, the full set of 1996 to 2011 MEPS data. The variables in three data sets were first centered and scaled before PCA. PCs approximation was studied with two approaches. First, the PC loadings were squared to estimate the variance contribution by variables to PCs. The other method was to use forward-stepwise regression to approximate PCs with all input variables.
The first few PCs represented large portions of total variances in each data set. Approximating PCs using stepwise regression could more efficiently identify the input variables that explain large portions of PC variances than approximating according to PCA loadings in three data sets. It required few numbers of variables to explain more than eighty percent of the PC variances.
Approximating and interpreting PCs with stepwise regression is highly feasible. Approximating PCs can help i) interpret PCs with input variables, ii) understand the major sources of variances in data sets, iii) select unique sources of information and iv) search and rank input variables according to the proportions of PC variance explained. This is an approach to systematically understand databases and search for variables that are highly representative of databases.
Composite measures and indices are used in medical research to represent certain concepts that cannot be measured with one variable. They can be used to predict outcomes or serve as outcomes in trials. The creation of innovative indices is important to increase publications and secure research funding. However, some assumptions and problems are prevalent among indices. We aim to develop a reporting guide and an appraisal tool for indices based on the issues we identified.
We reproduced the three frailty indices from a previous publication and 134,689 principal component-based indices. We reviewed the index assumptions, bias introduced by data processing, relationships between input variables. We interpreted the indices with input variables.
We identified four major issues to be addressed in a reporting guide: constraints imposed by index creation on the input variables; data processing without evidence base; indices poorly linked to input variables; and, relatively inferior predictive power. We demonstrated a flow diagram and a checklist to report and review these four issues related to innovative indices.
A reporting and critical appraisal tool for innovative indices is lacking and needed. These four issues that need to be explicitly considered are previously neglected. This guide is the first attempt to improve the quality and generalizability of innovative indices. This guide can be used to lead further discussion with other experts and review committees.
Principal component analysis (PCA) is used for dimension reduction and data summary. However, principal components (PCs) cannot be easily interpreted. To interpret PCs, this study compares two methods to approximate PCs. One uses the PCA loadings to understand how input variables are projected to PCs. The other uses forward-stepwise regression to determine the proportions of PC variances explained by input variables.
Two data sets derived from the Canadian Health Measures Survey (CHMS) were used to test the concept of PC approximation: a spirometry subset with the measures from the first trial of spirometry; and, full data set that contained representative variables. Variables were centered and scaled. PCA were conducted with 282 and twenty-three variables respectively. PCs were approximated with two methods.
The first PC (PC1) could explain 12.1 percent and 50.3 percent of total variances in respective data sets. The leading variables explained 89.6 percent and 79.0 percent of the variances of PC1 in respective data sets. It required one and two variables to explain more than 80 percent of the variances of PC1, respectively. Measures related to physical development were the leading variables to approximate PC1 and lung function variables were leading to approximate PC2 in the full data set. The leading variable to approximate PC1 of the spirometry subset were forced expiratory volume (FEV) 0.5/forced vital capacity (FVC) (percent) and FEV1/FVC (percent).
Approximating PCs with input variables were highly feasible and helpful for the interpretation of PCs, especially for the first PCs. This method is also useful to identify major or unique sources of variances in data sets. The variables related to physical development are the variables related to the most variations in the full data set. The leading variable in the spirometry subset, FEV0.5/FVC (percent), is not well studied for its application in clinical use.