The last few years have witnessed a surge in the development of predictive classifiers to ascertain the probability of an individual to develop a particular mental health condition at some point in the future. These efforts are potentially important as they could make significant contributions to developing preventive approaches (Cannon et al. Reference Cannon, Yu, Addington, Bearden, Cadenhead and Cornblatt2016). While such classifiers have yet to be adopted generally in clinical practice, recent influential models, reporting high predictive values (from ~70–100%), raise such prospects, and have spurred startup companies to generate enthusiasm and attract investors (Hayden, Reference Hayden2017), as well as ambitious strategies for prevention (Couzin-Frankel, Reference Couzin-Frankel2017).
Within psychiatric conditions, predictive classifiers have probably been mostly applied in autism and psychosis spectrum disorder populations (ASD and PSD, respectively). Researchers suggest that various indices, from simple demographics to biological, could be used in clinical practice to improve diagnosis of ASD (Pramparo et al. Reference Pramparo, Pierce, Lombardo, Carter Barnes, Marinero and Ahrens-Barbeau2015; Yahata et al. Reference Yahata, Morimoto, Hashimoto, Lisi, Shibata and Kawakubo2016; Howsmon et al. Reference Howsmon, Kruger, Melnyk, James and Hahn2017; Emerson et al. Reference Emerson, Adams, Nishino, Hazlett, Wolff and Zwaigenbaum2017; Hazlett et al. Reference Hazlett, Gu, Munsell, Kim, Styner and Wolff2017) and PSD (Cannon et al. Reference Cannon, Cadenhead, Cornblatt, Woods, Addington and Walker2008; Bedi et al. Reference Bedi, Carrillo, Cecchi, Slezak, Sigman and Mota2015; Cannon et al. Reference Cannon, Yu, Addington, Bearden, Cadenhead and Cornblatt2016; Fusar-Poli et al. Reference Fusar-Poli, Rutigliano, Stahl, Davies, Bonoldi and Reilly2017; Hafeman et al. Reference Hafeman, Merranko, Goldstein, Axelson, Goldstein and Monk2017), within both general and high-risk populations. However, since these classifiers can have critical implications in public health, inspection of the validity of their predictive values is important because ‘false diagnostic predictions have the potential to adversely affect individuals and families’ (Hazlett et al. Reference Hazlett, Gu, Munsell, Kim, Styner and Wolff2017).
Generally, the accuracy of predictive classifiers depends on: (1) the data collected before the onset of the condition (which could include but not be limited to demographic, clinical, genetic, and brain-based markers/indices), (2) the clinical classification instrument used to determine the presence or absence of the condition, and (3) the prevalence of diagnosed individuals in the test population. While classifiers have been criticized on both methodological and statistical grounds (Studerus et al. Reference Studerus, Ramyead and Riecher-Rossler2017), accounting for the epidemiological prevalence of the condition in question has largely been overlooked. By not adjusting for epidemiological prevalence, which is unfortunately a commonplace practice (Cannon et al. Reference Cannon, Cadenhead, Cornblatt, Woods, Addington and Walker2008; Sundermann et al. Reference Sundermann, Herr, Schwindt and Pfleiderer2014; Bedi et al. Reference Bedi, Carrillo, Cecchi, Slezak, Sigman and Mota2015; Pramparo et al. Reference Pramparo, Pierce, Lombardo, Carter Barnes, Marinero and Ahrens-Barbeau2015; Yahata et al. Reference Yahata, Morimoto, Hashimoto, Lisi, Shibata and Kawakubo2016; Emerson et al. Reference Emerson, Adams, Nishino, Hazlett, Wolff and Zwaigenbaum2017; Hazlett et al. Reference Hazlett, Gu, Munsell, Kim, Styner and Wolff2017; Just et al. Reference Just, Pan, Cherkassky, Mcmakin, Cha and Nock2017), classifiers often seriously overestimate their clinical potential even for enriched, high-risk populations.
The clinical utility of the classifier is estimated in terms of two values: the positive predictive value (PPV; i.e. how likely it is that the individual with the condition is correctly identified) and the negative predictive value (NPV; i.e. how likely it is that the individual without the condition is correctly identified). Importantly, these values are sensitive to the prevalence of the condition in the population of interest. In cases where there is a mismatch between the prevalence of the condition in the test sample and its prevalence in the population (general or high-risk), the clinical value of the classifier needs to be estimated by calculating the Bayes’ adjusted positive and negative predicted values for the prevalence of the condition in the population as follows:
We emphasize that what we are presenting here is not a new analysis or method, but an overlooked necessary step for calculating a classifier's predictive value and thus estimating its clinical utility. In fact, the influence of the prevalence of disease on the predictive values of diagnostic/screening classifiers has long been recognized such that increasing prevalence increases PPV and decreases NPV (Mausner & Kramer, Reference Mausner and Kramer1985; Altman & Bland, Reference Altman and Bland1994), as also shown in our prior study of ASD (Skafidas et al. Reference Skafidas, Testa, Zantomio, Chana, Everall and Pantelis2014).
We illustrate this point by an examination of two recent influential studies reporting on the promise of such diagnostic classifiers. The first study (Hazlett et al. Reference Hazlett, Gu, Munsell, Kim, Styner and Wolff2017) reports on a diagnostic classifier that uses brain surface area of 6–12-month-old siblings of children with ASD, to predict whether these infants would develop the condition at age 24 months. It is reported that a deep-learning algorithm that primarily used this brain measure, correctly classified which of the infants developed the condition at a 94% level of accuracy, with 88% sensitivity, and 95% specificity. This corresponded to 81% PPV, and 97% NPV.
Relevant to our argument, the reported sensitivity and specificity values in this study were based on the analysis of 179 infants of high familial risk, of whom 34 infants developed ASD at 24 months of age. The crucial point to which we would like to draw the readers’ attention is that the resultant predictive values are based on the prevalence of ASD in this test sample, which is at 19% (or 34/179). However, the epidemiological prevalence of ASD in children having siblings with ASD in this sample is likely to be overestimated (Szatmari et al. Reference Szatmari, Chawarska, Dawson, Georgiades, Landa and Lord2016), and estimates from a large population study suggest a much lower prevalence of 6.9% for full siblings, 2.4% for maternal half-siblings, and 1.50% for paternal half-siblings (Gronborg et al. Reference Gronborg, Schendel and Parner2013). Under such uncertainty, of over- or under-estimation of prevalence rates, the clinical utility of the classifier simply cannot be evaluated; substantiated epidemiological prevalence rates are a necessary first step to assessing the true predictive validity of any diagnostic classifier. Therefore, when adjusting for a prevalence of 6.9%, for example, their test with 88% sensitivity and 95% specificity yields 57% PPV and 99% NPV. These values translate to a high false discovery rate of 43% (i.e. the probability of misclassifying those with a condition as without), and low false omission rate of 1% (i.e. the probability of misclassifying those without a condition as having the condition), which substantially undermines the clinical utility of the classifier to detect risk for ASD among infants at high familial risk for ASD.
In a second study, Pramparo et al. (Reference Pramparo, Pierce, Lombardo, Carter Barnes, Marinero and Ahrens-Barbeau2015) reported that genomic biomarkers correctly classified 83% of boys with ASD in general pediatric settings in the discovery population (80% specificity and 85% sensitivity) and 75% of the replication sample (72% specificity and 77% sensitivity). These estimates were based on a 52% prevalence in the discovery sample, and 47% prevalence in the replication sample, both of which do not reflect the population prevalence of ASD in boys in the USA, currently estimated at 1 in 42 boys, or 2.38% (C.D.C.P, 2012). Using the specificity and sensitivity estimates from the replication sample (Specificity = 72.41%; Sensitivity = 77.27%) while adjusting for epidemiological prevalence, the PPV is only 6.39% and the NPV is 99.24%, reflecting extremely high false discovery rate of about 93%, and extremely low false omission rate of about 0.8%. These adjusted values undermine the promise of the classifier in ‘detecting risk for ASD among infants in the general pediatric population’ (Pramparo et al. Reference Pramparo, Pierce, Lombardo, Carter Barnes, Marinero and Ahrens-Barbeau2015).
To illustrate more fully the dependency of predictive values on the prevalence rate within a study cohort compared with the estimates in the population, we repeated the same analysis from the Hazlett et al. (Reference Hazlett, Gu, Munsell, Kim, Styner and Wolff2017) study by adjusting for the prevalence of ASD in the general population (1.13%) (C.D.C.P, 2012), paternal (1.5%) and maternal (2.4%) half-siblings (Gronborg et al. Reference Gronborg, Schendel and Parner2013), as well as for the prevalence of ASD in dizygotic (35%) and monozygotic (70%) twins (Hallmayer et al. Reference Hallmayer, Cleveland, Torres, Phillips, Cohen and Torigoe2011) (see Fig. 1). This figure underscores the importance of understanding the sources of variation in the prevalence estimates of the condition of interest and their impact on the predictive values of screening assessments or biological markers.
Our discussion underscores that the utility of a diagnostic classifier for a particular condition depends both on the discriminatory value of the classifier and on the prevalence of the condition in the population of interest. Accurate predictive values are particularly important when screening assessments or biological markers are considered as early indicators that lead to early identification of diseases. Failure precisely to calculate predictive values may lead to misleading conclusions about the utility of the assessment tools and/or biological markers for the professional community as well as the general population. Therefore, clinical classifiers should be developed in tandem with rigorous epidemiological studies to ascertain the prevalence of psychiatric conditions in both general and high-risk populations. Above all, the importance of accurate prevalence estimates cannot be overlooked as it allows for the allocation of resources including use of screening assessments and biological markers for efficient interventions/preventions to lessen disease burden.