Schizophrenia is a clinically heterogeneous syndrome with substantial heritability. Common small genetic risk factors (polygenic risk) collectively account for some 30% of heritability.Reference Purcell and Wray1 Copy number variants (CNVs) (DNA segments of >1 kilobase, present at higher (duplication) or lower (deletion) number than in a reference genome) have a larger role to play in a small subset of cases. There is evidence of genome-wide significant association for at least eight CNV loci in schizophrenia.Reference Marshall, Howrigan, Merico, Thiruvahindrapuram, Wu and Greer2 Individually these events are of moderate penetrance for schizophrenia, with reported odds ratios (ORs) of 2–30, which indicates that they are likely to have a substantial role in disease aetiology, at least for a small group of patients.Reference Kirov, Rees, Walters, Escott-Price, Georgieva and Richards3 Almost all of these CNVs are pleiotropic; for carriers of any one of the schizophrenia-associated (SCZ-associated) CNVs, the risk of developing any early developmental disorder (e.g. intellectual disability, autism spectrum disorder, developmental delay) is significantly higher than the risk of developing schizophrenia itself.Reference Kirov, Rees, Walters, Escott-Price, Georgieva and Richards3 Even in the absence of a psychiatric diagnosis, carriers of CNVs associated with schizophrenia have significant but variable cognitive deficits.Reference Kendall, Rees, Escott-Price, Einon, Thomas and Hewitt4 These CNVs are therefore potentially pathogenic and clinically significant, but outcomes range from subtle cognitive effects to severe neurodevelopmental disorders. Despite significant progress in our understanding of the genetics of schizophrenia, the process of translating SCZ genetic discovery into clinical impact is in its infancy.
Chromosomal microarray (CMA) testing is recommended as a first-tier genetic test in autism, developmental delay and intellectual disability.Reference Miller, Adam, Aradhya, Biesecker, Brothman and Carter5 Because a smaller proportion of people with schizophrenia (~2.5%) carry known pathogenic CNVs, routine testing is not currently recommended. However, a genetic diagnosis may be empowering for patients and their families, it can inform screening for relevant medical comorbidities and help in reproductive planning.Reference Schaefer and Mendelsohn6 The identification of clinical symptoms or demographic features that differentiate people with schizophrenia who carry SCZ-associated CNVs may be helpful in clarifying who might benefit most from testing. On the basis of the known overlap with other neurodevelopmental disorders and previously reported phenotype studiesReference Kirov, Rees, Walters, Escott-Price, Georgieva and Richards3,Reference Philip and Bassett7–Reference Costain, Lionel, Fu, Stavropoulos, Gazzellone and Marshall15 we hypothesised that individuals with schizophrenia who carry SCZ-associated CNVs are more likely to have phenotypic features suggestive of pre-existing neurodevelopmental compromise, earlier onset of psychotic symptoms or a positive family history of neurodevelopmental disorder. The objective of this work was to determine whether clinically identifiable phenotypic features could be used to model SCZ-associated CNV carrier status in a large schizophrenia cohort.
Selection of phenotypic variables
A literature review was conducted in PubMed from January 2008 to February 2016 to identify clinical and phenotypic features reported to be associated with copy number variation in schizophrenia using the search terms ‘schizophrenia’, ‘copy number variant’ and ‘phenotype’. Publications that specifically described CNV-associated clinical and phenotypic features in schizophrenia were selected to identify neurodevelopmental phenotypic categories. Identified phenotypic domains included early onset of psychosis; premorbid cognitive difficulties; delays in developmental milestones; family history of neurodevelopmental disorder; and syndromal characteristics (dysmorphic features, congenital malformations). Eight specific features falling within these domains were identified through expert clinical consensus that are readily identifiable in a standard clinical evaluation and therefore ultimately of clinical utility and acceptability. Subsequently, ‘dysmorphic features’ and ‘congenital anomalies’ were excluded because reliable identification of these features requires additional training or clinical tools.Reference Miles, Takahashi, Hong, Munden, Flournoy and Braddock16 The phenotypic variables selected for analysis are outlined in Table 1.
ASD, autism spectrum disorder.
a. Fisher's exact tests were used to assess associations. Significant results are in bold.
The discovery data-set
The discovery data-set consisted of 1215 individuals of Irish ancestry for whom both clinical phenotype and genome-wide SNP array data were available.17 The individuals were all over 18 years of age and had a diagnosis of schizophrenia or schizoaffective disorder after a structured clinical assessment (as described by First et al Reference First, Spitzer, Gibbon and Williams18). Written informed consent was obtained from all participants. Diagnosis was made on the basis of the consensus lifetime best estimate method using all available information (interview, family or staff report, chart review) with DSM-IV criteria as per the Structured Clinical Interview for DSM-IV-TR Axis I Disorders, research version, patient edition (SCID-I/P). Each referral centre obtained local research ethics committee (REC) approval. There was a preponderance of males in this sample (64%).
Phenotypic data were collected retrospectively from an existing research cohort.17 The phenotypic data were collected from the SCID-I/P and consisted of interview self-reports. The definitions applied to identify a positive history of the phenotypic variables are outlined in Table 1. Phenotypic data were coded as categorical variables (missing information is described in supplementary Table 1, available at https://doi.org/10.1192/bjp.2019.262).
The replication data-set
The replication data-set was obtained from 19 879 schizophrenia cases published by the Schizophrenia Working Group of the Psychiatric Genomics Consortium (PGC) cohorts (representing 40 cohorts excluding data on Irish individuals).Reference Marshall, Howrigan, Merico, Thiruvahindrapuram, Wu and Greer2 Contributors of the constituent data-sets were approached to request access to additional phenotypic data to replicate the discovery findings. Only one cohort (the Cardiff data-set) was identified with the requisite phenotype data and adequate sample size for replication (many of the well-phenotyped cohorts were small and consequently had no CNV carriers).
The Cardiff data-set (n = 479) consisted of participants from the previously reported Cardiff Cognition in Schizophrenia (CardiffCOGS) study.Reference Hamshere, Walters, Smith, Richards, Green and Grozeva19,Reference Rees, Walters, Georgieva, Isles, Chambert and Richards20 In brief, the sample was recruited with REC approval from community, in-patient and voluntary-sector mental health services in the UK. Written informed consent was obtained from all participants. Participants had a clinical diagnosis of schizophrenia and were interviewed using the Schedules for Clinical Assessment in Neuropsychiatry (SCAN) and case-note review to derive a best-estimate lifetime diagnosis according to DSM-IV criteria. Similar to the discovery set, there was a preponderance of males in the sample (61.2%). The comparable phenotype variables investigated in the Cardiff data-set were: (a) ‘history of developmental delay’, which was directly comparable to the Irish data-set variable and was defined as ‘clinically relevant delays in speech, walking, coordination or diagnosed developmental problem’ and (b) a positive history of epilepsy, intellectual disability and/or autism spectrum disorder, which was included as ‘comorbid neurodevelopmental diagnosis’. Intellectual disability referred to an IQ <70 and clinical specialist service involvement. The autism spectrum disorder and epilepsy variables were interview self-report of a clinical diagnosis. Missing information is described in supplementary Table 2. The other phenotypic variables selected for in the initial analysis were not collected in this data-set.
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. All procedures involving human participants were approved by the relevant local research ethics committees in Ireland and the UK, as outlined above.
The target CNVs used in the analysis were fifteen CNVs with the strongest evidence of association with schizophrenia (supplementary Table 3) analysed by Rees et al.Reference Rees, Walters, Georgieva, Isles, Chambert and Richards20 Twelve of these were also identified in the large PGC CNV meta-analysisReference Malhotra and Sebat21 and the other three were exon-disrupting deletions at the NRXN1 gene, deletion at distal 16p11.2 and duplications at the Williams–Beuren region identified on the basis of expert consensus or evidence published after the meta-analysis.Reference Rees, Walters, Georgieva, Isles, Chambert and Richards20
Genotyping and CNV calling
The Irish sample was genotyped on the Affymetrix 6.0 array (n = 802) or the Illumina HumanCoreExome chip (n = 413) (full details are available in the literature17). The Cardiff sample was genotyped using HumanOmniExpress-12v1-1_B arrays (Illumina).Reference Rees, Kendall, Pardiñas, Legge, Pocklington and Escott-Price22 To control for platform effects, raw intensity data were provided to the PGC CNV analysis group. This provided a centralised pipeline for systematic CNV calling including multiple CNV callers run in parallel. The final CNV set was defined as those >20 kb in length and including at least 10 probes and <1% minor allele frequency (MAF).Reference Marshall, Howrigan, Merico, Thiruvahindrapuram, Wu and Greer2
Univariate analyses (Fisher's exact tests) were performed first, to assess associations between phenotypic predictors and SCZ-associated CNV status in the Irish cohort. Multiple logistic regression analysis was then carried out to examine the effects of significant phenotypic variables, identified on univariate analysis, in modelling SCZ-associated CNV status. The final independent variables included in the model were those with a significance level of 0.05 following backward elimination steps. Model fit was assessed using Nagelkerke pseudo R 2 index.
Receiver operating characteristic (ROC) curve analysis was used to test the validity, sensitivity and specificity of the logistic regression parameters for modelling SCZ-associated CNV carrier status in the Irish discovery data-set.
The Cardiff replication data-set included data on two of the phenotypic variables of interest. A multiple logistic regression model including these two variables was trained from the Irish discovery data-set and then applied to the Cardiff data-set. ROC curve analysis was used to assess the accuracy of the neurodevelopmental variables in modelling SCZ-associated CNV carrier status in the replication data-set.
The results presented are not corrected for multiple comparisons and all analyses were completed in R version 3.2.3 for Windows.23
From the total sample of 1215 individuals in the Irish discovery data-set, 19 (1.6%) carried one of the 15 identified SCZ-associated CNVs.Reference Rees, Walters, Georgieva, Isles, Chambert and Richards20 No individuals carried more than one SCZ-associated CNV. The details of the CNVs and positions are listed in supplementary Table 4. The proportions of individuals with a positive history of phenotypic variables and SCZ-associated pathogenic CNV status are available in supplementary Table 5.
Univariate analyses identified four phenotypic variables with significant associations with SCZ-associated CNV status: ‘history of developmental delay’, ‘comorbid neurodevelopmental disorder’, ‘history of learning difficulties’ and ‘specific learning disorder’ (Table 1). A multiple logistic regression model was fitted using these four variables. The variables ‘history of learning difficulties’ and ‘specific learning disorder’ were correlated (phi coefficient φ = 0.22) and were likely capturing similar phenotypic information. Backward elimination at this point removed the variable ‘history of learning difficulties’ from the model. The final independent variables in the model were ‘history of developmental delay’, ‘comorbid neurodevelopmental disorder’ and ‘specific learning disorder’. These variables had odds ratios of 5.19 (95% CI 1.58–14.76, P = 0.003), 5.87 (95% CI 1.28–19.69, P = 0.009) and 8.12 (95% CI 1.16–34.88, P = 0.012) respectively when included in the logistic regression model (Table 2). Nagelkerke pseudo R 2 for the model was 0.196, indicating that the phenotypic variables accounted for 19.6% of the variance in SCZ-associated CNV status in this sample.
a. Predictor coefficients were tested using Wald tests and confidence intervals were obtained using the Wald method. Nagelkerke pseudo R 2 = 0.196.
The performance of the three significant independent variables in modelling SCZ-associated CNV carrier status was tested using ROC curve analysis. An area under the ROC (AUROC) curve of 74.2% (95% CI 61.9–86.4%) was achieved, accounting for 58.8% (95% CI 32.9–81.6%) sensitivity and 89.1% (95% CI 87.1–90.9%) specificity in modelling SCZ-associated CNV carrier status (Table 3).
AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value.
a. Optimal cut-off value, sensitivity, specificity, AUC and predictive values were calculated using three independent variables (‘history of developmental delay’, ‘comorbid neurodevelopmental disorder’, ‘specific learning disorder’).
Eight individuals (1.7%) in the Cardiff replication data-set (n = 479) set carried one of the 15 identified risk CNVs, including one 1q21.2 duplication, one NRXN1 deletion, one Williams–Beuren region duplication, three 15q11.2 deletions and two 22q11.2 deletions. No individual carried more than one of these CNVs.
The Cardiff replication data-set included data on two of the phenotypic variables of interest: ‘history of developmental delay’ and ‘comorbid neurodevelopmental disorder’. The Irish discovery data-set was used to build a multiple logistic regression model using these two variables (supplementary Tables 6 and 7). Applying this model to the Cardiff study population gave an AUROC of 83% (95% CI 52.0–100.0%) in identifying SCZ-associated CNV status. The sensitivity and specificity were 75.0% (95% CI 19.4–99.4%) and 97.6% (95% CI 95.1–99.0%) respectively (Table 4).
AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value.
a. Optimal cut-off value, sensitivity, specificity, AUC and predictive values were calculated using two independent variables (‘history of developmental delay’, ‘comorbid neurodevelopmental disorder’).
We investigated whether phenotype information generated by a standard clinical assessment could identify people with schizophrenia at greater risk of carrying pathogenic CNVs. In a discovery cohort of 1215 people with schizophrenia, having a specific learning disorder (OR = 8.12, P = 0.012), developmental delay (OR = 5.19, P = 0.003) or a comorbid neurodevelopmental disorder (OR = 5.87, P = 0.009) successfully modelled positive carrier status for a SCZ-associated CNV. Other clinical features, such as early onset of psychosis, low educational attainment and a family history of neurodevelopmental disorders, were not associated with SCZ-associated CNV carrier status in this cohort. The three ‘neurodevelopmental’ variables showed a relatively high specificity (89.1% (95% CI 87.1–90.9%)) but more modest sensitivity (58.8% (95% CI 32.9–81.6%)) in modelling carrier status for a SCZ-associated CNV in the Irish discovery sample. Information on ‘specific learning disorders’ was not available for the Cardiff replication sample. On the basis of the remaining two variables, ‘comorbid neurodevelopmental disorder’ and ‘history of developmental delay’, we applied a model from the original data-set to the Cardiff sample. This too showed relatively high specificity (97.6% (95% CI 95.1–99.0%)) but more modest sensitivity (75.0% (95% CI 19.4–99.4%)) in modelling carrier status for a SCZ-associated CNV.
Recent studies have suggested that identifying people with schizophrenia who have comorbid intellectual disability is likely to be helpful in identifying subsets of individuals with genomic disorders. Thygesen and colleagues reported an approximately three-fold higher rate of pathogenic CNVs in people with psychosis and intellectual disability compared with rates in the general schizophrenia population.Reference Thygesen, Wolfe, McQuillin, Viñas-Jornet, Baena and Brison24 Lowther et al examined the genome-wide burden of pathogenic CNVs in a schizophrenia cohort (n = 546) and demonstrated a significantly higher burden of pathogenic CNVs (OR = 5.01, P = 0.0001) in people with schizophrenia and low IQ (IQ <85) compared with those with average IQ (IQ ≥85). On the basis of their findings, the authors concluded that individuals with schizophrenia and low IQ should be prioritised for clinical microarray testing in clinical and research contexts.Reference Lowther, Merico, Costain, Waserman, Boyd and Noor25 We believe that our study provides further support for this recommendation, but that other developmental indices, which could be captured by a clinical neurodevelopmental history, should also be considered in the development of any future guidelines.
A small subset of people with schizophrenia (~2.5%) carry CNVs that substantially increase the risk for schizophrenia but also for other neurodevelopmental disorders. The clinical benefits of identifying such people been demonstrated for other neurodevelopmental disorders.Reference Miller, Adam, Aradhya, Biesecker, Brothman and Carter5,Reference Schaefer and Mendelsohn6 Similar benefits are likely to apply in schizophrenia, but as these events are rare, routine genetic testing for all individuals is probably not indicated. Previous studies suggest that targeting people with schizophrenia and comorbid intellectual disability is likely to be more fruitful in identifying such cases.Reference Thygesen, Wolfe, McQuillin, Viñas-Jornet, Baena and Brison24,Reference Lowther, Merico, Costain, Waserman, Boyd and Noor25 Our findings suggest that careful clinical history taking to document developmental delay, reported learning disorders or a comorbid diagnosis of autism spectrum disorder or epilepsy may also be informative in screening for people with schizophrenia at higher risk of carrying known SCZ-associated CNVs.
These are rare events, but very large cohorts of genotyped people with schizophrenia are available and it is likely that whole genome sequence analysis of >30 000 such individuals will soon be completed. As these data are analysed the subset of people with schizophrenia who carry rare mutations and CNVs of likely clinical significance will increase, as has been the case for other neurodevelopmental disorders. Regrettably, there is a dearth of phenotype information available from many of the contributory cohorts. We strongly support efforts by the PGC to collect and standardise such phenotype information where it is available. For future cohorts, having detailed phenotype information together with neurodevelopmental and medical history will likely be helpful in refining predictor variables that ultimately may inform guidelines for genetic testing for people with schizophrenia.
Strengths and limitations
The strength of our study lies in the fact that we were able to build a well-characterised phenotype data-set, based on extensive clinical and research data compiled from previous schizophrenia research studies. We were able to test multiple phenotypic features for potential in identifying pathogenic CNV status and identify three variables that are easily clinically identified and that show considerable promise in identifying a high-risk group.
Recurrent SCZ-associated CNVs are rare events (~1:150–1:1000)Reference Kirov, Rees, Walters, Escott-Price, Georgieva and Richards3 and individual cohorts are likely to identify only a modest number of known CNVs, as demonstrated in our sample of 1215 people with schizophrenia. The study highlighted the relative limitations of phenotypic information across schizophrenia cohorts and suggested phenotypes derived from a standard clinical interview that could inform future studies. Further analysis of a wider psychosis population and other cross-disorder analyses are also likely to be valuable.
Our discovery and replication cohorts used retrospective phenotypic data from which we identified variables that provided estimates of sensitivity and specificity for modelling SCZ-associated CNV carrier status. Significantly larger, well-characterised phenotypic samples (e.g. prospective cohorts) will be required to provide more refined estimates of sensitivity and specificity to inform genetic screening guidelines. It will be important to consider the patient and family perspective to inform any future guidelines for genetic testing, but that was beyond the scope of the current investigation.
This work was supported by: the National Institutes of Health (J.S., MH119746 and MH109501; A.C., NIH grant MH109501); the National Institute of Mental Health (A.C., NIMH grants MH 41953 and MH083094); Science Foundation Ireland (A.C., 12.IP.1359 and 08/IN.1/B1916); and the Wellcome Trust Case Control Consortium 2 (A.C., 085475/B/08/Z and 085475/Z/08/Z).
We thank all the study participants, participating professionals, investigators and recruitment sites. We gratefully acknowledge the work of the Schizophrenia Working Group of the Psychiatric Genomics Consortium, whose ongoing collaborative efforts are essential to continuing progress in understanding the genetic architecture of schizophrenia.
Data are available from the authors on reasonable request.
C.F., E.A.H., M.G., L.G. and A.C. were responsible for the study conception and design. C.M., E.K., C.F. and A.D. were responsible for collection and coding of primary phenotypic data from the discovery data-set. C.M., E.K., D.H., D.M., C.P., P.C. and G.D. were responsible for the collection and processing of the genetic data from the discovery data-set. J.W., M.O. and M.O'D. contributed the data for the replication data-set and provided feedback with regard to the data analysis and interpretation. J.S. contributed to the core analysis of the genetic data used in the study. C.F., E.A.H., L.G. and A.C. contributed to the data analysis and interpretation and drafted the manuscript. All authors reviewed and approved the final manuscript.
Supplementary material is available online at https://doi.org/10.1192/bjp.2019.262.