Schizophrenia is a severe mental disorder accompanied by delusions, hallucinations, and cognitive impairment. It affects nearly 1% of the world’s population and the biological underpinnings of schizophrenia have remained elusive despite decades of intensive research [Reference Dhindsa and Goldstein1]. One important theory about its etiology is the dysconnectivity hypothesis, which proposes that the aberration of neural circuits during neural development plays a crucial role in the disease process [Reference Andreasen, Paradiso and O’Leary2]. The development of functional connectivity (FC) analysis [Reference Biswal, Yetkin, Haughton and Hyde3,Reference Greicius, Krasnow, Reiss and Menon4] provides an optimal tool to test the hypothesis and have consistently identified FC abnormalities in widespread cortical and subcortical structures, including anterior cingulate cortex [Reference Tu, Buckner, Zollei, Dyckman, Goff and Manoach5], thalamus [Reference Tu, Lee, Chen, Hsu, Li and Su6,Reference Anticevic, Haut, Murray, Repovs, Yang and Diehl7], basal ganglion [Reference Tu, Hsieh, Li, Bai and Su8,Reference Karcher, Rogers and Woodward9], and cerebellum [Reference Chen, Tu, Lee, Chen, Li and Su10] in patients with schizophrenia. With advancements in machine learning in medical imaging, researchers further explored the use of brain-wide FCs based on a specific anatomical or functional parcellation as features for single subject prediction of patients with schizophrenia [Reference Du, Fu and Calhoun11]. Early studies based on a small sample size have reported classification performances of 93.2% (44 participants) [Reference Tang, Wang, Cao and Tan12] and 83% (56 participants) [Reference Arbabshirani, Kiehl, Pearlson and Calhoun13]. Several more recent studies have included larger samples. For example, Zhao et al. [Reference Zhao, Guo, Linli, Yang, Lin and Tsai14] included 283 participants (135 with schizophrenia and 148 healthy controls) and obtained an accuracy of 71% based on FC features, and Kalmady et al. [Reference Kalmady, Greiner, Agrawal, Shivakumar, Narayanaswamy and Brown15] included 174 participants (81 with drug-naïve schizophrenia and 93 healthy controls) and reported an accuracy of 87% with ensemble learning. Lei et al. [Reference Lei, Pinaya, van Amelsvoort, Marcelis, Donohoe and Mothersill16] evaluated five datasets of 112–192 participants and noted an average accuracy of 82.61% (77.1–87.3%). Together, these preliminary findings indicate that machine learning models are feasible for automated diagnosis of schizophrenia. However, the accuracy range varied substantially across these studies, and the relatively small sample size in many of them limited the application of the models in real-world clinical settings.
Sample size plays a key role in machine learning. A large single-site sample, which automatically covers more variation in disease features, is suggested to be helpful in building more robust classification models for real-world application than are other sample types [Reference Krystal and State17–Reference Rashid and Calhoun19]. However, several reviews of previous machine learning studies have observed a negative correlation between sample size and accuracy [Reference Schnack and Kahn18,Reference Varoquaux20] with high accuracy predictions usually limited to studies with small samples [Reference Arbabshirani, Plis, Sui and Calhoun21]. One explanation is that sample size influences the trade-off between accuracy and generalizability [Reference Schnack and Kahn18]. Small, homogeneous samples are able to produce classification models with high accuracy, at the cost of low generalizability, whereas large, heterogeneous samples produce models with better generalizability at the cost of accuracy. However, recent simulation and empirical studies have highlighted the critical role of biased estimations in machine learning studies with small sample sizes. The high accuracy may have been obtained because of inherent large variance of performance in studies with small samples as well as publication bias in reporting significant effects [Reference Varoquaux20] and biased validation processes with a limited sample size [Reference Vabalas, Gowen, Poliakoff and Casson22]. Notably, the popular K-fold cross-validation method produces strongly biased performance estimates with small samples because it does not ensure that the data used to validate the classifier are not part of the data used to train it [Reference Vabalas, Gowen, Poliakoff and Casson22]. Therefore, it was unclear whether high accuracy can be achieved for the identification of patients with schizophrenia based on a large heterogeneous sample with the current approach.
In the present study, we use a large single-site resting fMRI dataset of 220 patients with schizophrenia and 220 healthy controls to develop machine learning models for automatic identification of patients with schizophrenia based on brain-wide FCs and test the hypothesis that support vector machines (SVMs) based on larger, heterogeneous samples can also provide high classification accuracy. SVMs were adopted as machine learning models because they were most commonly used models in recent machine learning studies of psychiatric patients [Reference Zhao, Guo, Linli, Yang, Lin and Tsai14, Reference Vabalas, Gowen, Poliakoff and Casson22–Reference Lei, Pinaya, Young, van Amelsvoort, Marcelis and Donohoe24] and showed superior performance than other traditional models [Reference Lei, Pinaya, van Amelsvoort, Marcelis, Donohoe and Mothersill16]. Also, the sample size is too small for application of deep learning algorithms [Reference Cearns, Hahn and Baune25]. To the best our knowledge, this is the largest single site machine learning study of patients with schizophrenia based on brain-wise FCs to date. The data were collected using the same MRI machine and acquisition parameters from 2010 to 2019, thereby minimizing the confounding effects of medical center, MRI machine, and acquisition parameters. Given previous concerns of participant’s homogeneity and training sample size on classification accuracy, we also investigated the effect of these two factors on model performance.
Materials and Methods
The resting fMRI data set included 220 patients with schizophrenia and 220 sex- and age-matched healthy controls. Their demographic characteristics are presented in Table 1. All patients were and recruited from outpatient and inpatient units of the Taipei Veterans General Hospital in Taiwan. Structured clinical interviews based on the Diagnostic and Statistical Manual of Mental Disorders, fourth edition (DSM-IV) [Reference Cearns, Hahn and Baune26] confirmed the diagnoses and the clinical status of schizophrenic patients was characterized using the Positive and Negative Syndrome Scale (PANSS) [Reference First, Spitzer, Gibbon and Williams27]. We excluded the participants with the following conditions: (a) substance abuse or dependency in the preceding 6 months; (b) a history of head injury that resulted in sustained loss of consciousness or cognitive sequelae; and (c) neurological illnesses or any other disorder that affects cerebral metabolism. Of these patients, there were seven with comorbidity of other psychiatric disorders (detailed in Supplementary Table S1). The patients with schizophrenia were under stable treatments with various antipsychotics, antidepressants, and mood stabilizers before participating in the study.
Abbreviations: F, female; HC, healthy control; M, male; PANSS, Positive and Negative Syndrome Scale for Schizophrenia; SZ, schizophrenia.
Healthy controls were recruited through advertisements; they were screened by an experienced psychiatrist with the Mini International Neuropsychiatric Inventory Plus, and candidates with a possible major psychiatric illness were excluded. In addition, candidates with a history of first-degree relatives with axis-I disorders, including schizophrenia, major depressive disorder, and bipolar disorder, were excluded.
MRI image acquisition
MRI images were acquired using a 3.0 Tesla GE Discovery 750 whole-body high-speed imaging device with an eight-channel high-resolution brain coil. Head stabilization was achieved through cushioning, and all participants wore earplugs (29 dB rating) to attenuate noise. Automated shimming procedures were performed, and scout images were obtained. Resting-state functional images were collected using a gradient echo T2* weighted sequence (repetition time [TR]/echo time [TE]/flip angle = 2,500 ms/30 ms/90°). Forty-seven contiguous horizontal slices parallel to the intercommissural plane (voxel size: 3.5 × 3.5 × 3.5 mm3) were acquired and interleaved. These slices covered the cerebellum of each participant. During functional scanning, the participants were instructed to remain awake with their eyes open (each scan lasted 8 min and 24 s across 200 time points). In addition, a high-resolution structural image was acquired in the sagittal plane using a high-resolution sequence (TR = 2,530 ms, echo spacing = 7.25 ms, echo time TE = 3 ms, flip angle = 7°) and an isotropic 1-mm voxel (field of view: 256 × 256).
Regarding head motion during image acquisition, we used the method of scrubbing within regression (spike regression) suggested by Yan et al. [Reference Kay, Fiszbein and Opler28] to minimize the effect of head motion on FC measurement. This method identifies “bad” time points using a threshold of framewise displacement (FD) > 0.2 mm as well as one back and two forward neighbors [Reference Yan, Cheung, Kelly, Colcombe, Craddock and Di Martino29]; each “bad” time point was modelled as a separate regressor in the regression models [Reference Power, Barnes, Snyder, Schlaggar and Petersen30,Reference Lemieux, Salek-Haddadi, Lund, Laufs and Carmichael31]. The detailed parameters of motion correction were also provided in Supplementary Table S2 and there was no significant difference between these two groups.
All preprocessing was performed using the Data Processing Assistant for Resting-State fMRI (http://www.restfmri.net), which is based on Statistical Parametric Mapping (http://www.fil.ion.ucl.ac.uk/spm) and the Resting-State fMRI Data Analysis Toolkit (http://www.restfmri.net). The functional scans received slice-timing correction, motion correction, and were normalized to a standard anatomical space (Montreal Neurological Institute). Additional preprocessing steps were used to prepare the data for FC analysis. These were as follows: (a) spatial smoothing using a Gaussian kernel (6-mm full width at half-maximum), (b) temporal filtering (0.009 Hz < f < 0.08 Hz), and (c) removal of spurious or nonspecific sources of variance through regression of the following variables. (a) Six head motion parameters and autoregressive models of motion: six head motion parameters, six head motion parameters one time point before, and the 12 corresponding squared items [Reference Satterthwaite, Elliott, Gerraty, Ruparel, Loughead and Calkins32] (Friston 24-parameter model); (b) the mean whole-brain signal; (c) the mean signal within the lateral ventricles; and (d) the mean signal within a white matter mask. The regressors used in the method of scrubbing within regression were also included to minimize the effect of head motion on the measurement of FC. The regression of each of these signals was computed simultaneously, and the residual time course was then retained for the correlation analysis.
Calculation of brain-wise FCs
We chose three parcellations: the automated anatomical labeling atlas version 3 (AAL-3) [Reference Friston, Williams, Howard, Frackowiak and Turner33], AAL-2 [Reference Rolls, Huang, Lin, Feng and Joliot34], and Shen’s 268 parcellations [Reference Rolls, Joliot and Tzourio-Mazoyer35], comprising 166, 120, and 268 regions of interest (ROIs), respectively (Figure 1). The mean time series were derived for each ROI by averaging the time course of all voxels within the ROI. Pearson’s correlation coefficients for each pair of ROIs were calculated and z-transformed, yielding three FC matrices (166 × 166, 120 × 120, and 268 × 268) for each participant. By evaluating the model performance based on the three parcellations, we aimed to choose the one yielding best performance for later experiments. AAL-2 and AAL-3 were selected because the automated anatomical atlas [Reference Shen, Tokoglu, Papademetris and Constable36] was widely used in neuroimaging research. Compared with AAL-2, AAL-3 had a more detailed parcellation of the thalamus (15 parts) and we would like to know if it was helpful for model performance. Shen 268 was selected because it was defined according neuroimaging-based parcellation algorithms based FC data and ever adopted in our previous machine learning study of patients with bipolar disorder [Reference Tzourio-Mazoyer, Landeau, Papathanassiou, Crivello, Etard and Delcroix37].
Machine learning model creation, training, and performance evaluations
SVM is a supervised learning model with an associated learning algorithm that analyzes data for classification [Reference Chen, Tu, Huang, Bai, Su and Chen38]. The lower triangle elements of the FC matrix were congregated into a vector per subject and regarded as discriminative features to feed into the SVM for classifier training. The hyperparameters C = (1, 10, 100, 1000) and tolerance = (0.001, 0.01, 0.1, 1) of the SVM were optimized using grid search with cross-validation within the training set. To classify an FC matrix in the test set, its classification output was considered as true (positive) for schizophrenia if the probability of class 1 (i.e., diagnosed as schizophrenia) exceeded a predefined threshold (i.e., 0.5).
We used nested 10-fold cross-validation to evaluate SVMs with inner cross-validation for hyperparameter determination and outer cross-validation performance evaluations [Reference Chang and Lin39]. The entire dataset was divided into 10 folds that preserved the relative proportion of the 2 classes (i.e., schizophrenia and healthy controls) according to various experimental setups; nine folds were used as the training set, while the remaining fold was used as the test set. Each training set was used to perform inner cross-validation by dividing into 10 folds again, in which 9 folds were used to train the model and the remaining 1 fold was used for performance validation. This process was repeated 10 times until each of the 10 folds had served as the validation set for hyperparameter determination. The outer cross-validation process was also repeated 10 times until each of the 10 folds had served as the test set. The above process was repeated 10 times until each of the 10 folds had served as the test set. We repeated the experiment 100 times to avoid any bias introduced by random sampling in nested 10-fold cross-validation, and the mean ± standard deviation of the performance was reported.
The performances were evaluated by the following metrics: (a) accuracy: this is the fraction of predictions our machine learning model got right; (b) sensitivity (true positive rate): this refers to the proportion of testing instances who received a positive result out of those participants who actually have schizophrenia; (c) specificity (true negative rate): this refers to the proportion of testing instances who received a negative result out of those participants who do not actually have schizophrenia; (d) F1-score: it is the harmonic mean of precision and recall (sensitivity) that take both false positive and false negative into account; and (e) area under the curve (AUC): this provides an aggregate measure of performance across all possible classification thresholds. AUC ranges in value from 0 (a model whose predications are totally wrong) to 1 (otherwise).
First, we evaluated the classification performance of SVMs based on brain-wide FC of three parcellations, and the one with the best performance was selected for later experiments.
Next, we investigated whether increasing the homogeneity of the demographic properties of sex and age improved SVM performance. We divided the whole sample by sex (235 men: 120 with schizophrenia and 115 healthy controls; 205 women: 100 with schizophrenia and 105 healthy controls) and age (212 younger adults: 18–30 years, 106 with schizophrenia and 106 healthy controls; 228 older adults: 31–50 years, 114 with schizophrenia and 114 healthy controls), and the SVMs based on the subsamples were evaluated using nested 10-fold cross-validation with 100 random sampling, and the mean ± standard deviation of the performance was reported. We also evaluated the generalizability of these SVMs to the participants with different demographic characteristics. The male- or female-specific SVMs, which were trained by only male or female participants, were used to classify the clinical status of the other subsamples with different sex, that is, female or male, respectively. In the similar way, we applied the SVMs trained by only younger participants to predict the clinical status of older adults and those SVMs trained with only older adults to predict participants with younger adults.
At last, we evaluated the effects of training sample size on SVM performance to understand what number of participants is necessary to have a robust machine learning model. We randomly selected a test set of 40 participants (20 with schizophrenia and 20 healthy controls) and fixed the same test set in each testing group for performance comparisons. For training sample size setups, we started with N = 40 (20 with schizophrenia and 20 healthy controls) randomly drawn from the other 400 participants and incrementally increased the 20 patients with schizophrenia and 20 healthy controls until the maximum training set size of N = 400 was reached. A model was built from the training set and tested on the test set repetitively until N = 400. We conducted 100 repetitions with different random samplings of participants, and the mean ± standard deviation of the performance was reported.
The participants’ demographic data are presented in Table 1. We controlled the age and sex distribution of each group to ensure a balanced study design. Differences in demographic characteristics among the two groups were examined using the chi-square test for categorical variables and the t test for continuous variables. The mean ages of the patients with schizophrenia and healthy controls were 31.7 and 31.8 years, respectively. No significant differences were noted in age and sex distribution between the two groups. However, patients with schizophrenia had significantly lower education illnesses than healthy controls.
The performance of SVMs based on three different parcellations
The detailed results of SVM performance are presented in Table 2. The mean accuracy of the SVMs based on AAL-3, AAL-2, and Shen’s 268 parcellations were 85.05 ± 0.84%, 84.17 ± 0.88%, and 84.45 ± 0.89%, respectively. The SVMs based on AAL3 had slightly but significantly higher than those based on AAL-2 (p < 0.01) or Shen’s 268 (p < 0.01). Therefore, brain-wide FC based on AAL-3 was adopted for later experiments.
Abbreviations: AAL-2, the automated anatomical labeling atlas version 2; AAL-3, the automated anatomical labeling atlas version 3; AUC, area under curve; SVM, support vector machine.
The effects of demographic homogeneity and training sample size on classification accuracy
The detailed demographic and clinical characteristics of sub-samples according to sex or age range were provided in Supplementary Table S3. Sex-specific SVMs had accuracies of 84.66 ± 1.07% for men and 81.56 ± 1.27% for women (Table 3 and Figure 2a). We also evaluated the generalizability of the sex-specific SVMs to the participants of the other sex. The accuracy was 78.20% for predicting female participants by male-specific models and 81.33 ± 1.23% for predicting male participants by female-specific SVMs (Table 4). Thus, the female-specific SVMs, which yielded an accuracy of 81.56% for predicting female participants, generalized well to predict male participants with an accuracy of 81.33%, but not vice versa. Nevertheless, the sex-specific models had worse performances than the SVMs based on participants of both sexes.
Abbreviations: AUC, area under curve.
Abbreviations: AUC, area under curve; SVMs, support vector machines.
a The classification performance of predicting female participants by male-specific SVMs.
b The classification performance of predicting male participants by female-specific SVMs.
c The classification performance of predicting older-adult participants by younger-adult-specific SVMs.
d The classification performance of predicting younger-adult participants by old-adult-specific SVMs.
Age-specific SVMs yielded accuracies of 80.50 ± 1.38% and 86.13 ± 0.87% for younger and older adults (Table 3 and Figure 2a), respectively. We also evaluated the generalizability of the age-specific SVMs to the participants of the other age range. The accuracies for predicting young participants by using the older adults-specific SVMs and vice versa were 77.24 ± 1.07% and 82.93 ± 1.04%, respectively (Table 4). The younger adults-specific SVMs, which yielded an accuracy of 80.5% for predicting younger participants, generalized well to predict older participants with an accuracy of 82.93%. In the contrary, the older adults-specific SVMs, which yielded an accuracy of 86.13% for predicting older participants, generalized poorly to predict clinical status of young participants with an accuracy of 77.24%.
The relationship between classification accuracy and training sample sizes was shown in Table 5 and Figure 2b. As training sample size increased from 40 to 400, the mean accuracy increased consistently from 72.61 to 83.32% and an average accuracy >81% was achieved after N > 240. According to the standard deviations of classification accuracy across 100 times of random sampling, the SVMs based on higher training sample sizes had lower variance in performance, suggesting a higher stability.
Abbreviations: AUC, area under curve.
The FCs with greatest contributions to single subject classification
The identification of FCs contributing to differentiate patients from control subjects accurately provided a multivariate approach to identify biomarkers, which could lead to clinically useful tools for establishing both diagnosis and prognosis [Reference Krstajic, Buturovic, Leahy and Thomas40]. Therefore, we further analyzed the FCs contributing to classification performance. In each trained SVM, the absolute values of weights for each brain-wise FCs were regarded as feature importance and averaged across all SVMs based on AAL-3 with whole sample of N = 440. The top 20 FCs with the highest mean weights were listed in Table 6 and involved distributed cortical and subcortical structures (Figure 3). Among them, the thalamo-cerebellar FC had the highest mean weight and played the most important role in differentiating patients from controls.
At present, psychiatric diagnoses are based largely on psychiatric interviews, and brain imaging does not play a vital role. However, the approach of combining imaging and machine learning is appealing and could be immensely useful if it is proven to be a robust means of establishing a psychiatric diagnosis. In this study, we used a large single-site dataset to build SVMs to classify patients with schizophrenia and healthy controls based on brain-wide FC, with an accuracy of 85%. In contrast to recent concerns about the biased estimations of classification performance in studies with small samples [Reference Flint, Cearns, Opel, Redlich, Mehler and Emden23], the present results may provide a robust estimation of SVMs for automatic diagnosis of patients with schizophrenia based on brain-wise FCs. On the basis of our data, we recommend AAL-3 for the calculation of brain-wide FC because it yielded higher classification accuracy than AAL-2 and Shen’s 268. Although the models using more homogenous subsamples of narrower age range (the older adult group) seemed to provide better classification accuracy than the overall model, they had poor generalization to other samples with different demographic properties. We also found that classification accuracy increased with incremental increases in training sample size from 40 to 400, with an accuracy of >81% achieved with N > 240. These findings suggest that establishing an SVM based on a large single-site dataset covering varied demographics and disease features may be optimal for the automatic diagnosis of schizophrenia.
Our model had a mean accuracy of 85%, which is slightly better than those reported in recent machine-learning studies based on brain-wide FC: 82.4% [Reference Kalmady, Greiner, Agrawal, Shivakumar, Narayanaswamy and Brown15], 81.74% [Reference Lei, Pinaya, van Amelsvoort, Marcelis, Donohoe and Mothersill16], and 82.61% [Reference Gutiérrez-Gómez, Vohryzek, Chiêm, Baumann, Conus and Cuenod41]. Notably, the performance of these SVMs was highly consistent—between 80 and 85%—suggesting that brain-wide FC is a reliable feature for automatic classification of patients with schizophrenic disorder. The reported high accuracy (>90%) in early studies with small samples may have been due to high variability and over-optimistic estimation of accuracy during cross-validation within a small sample. A recent study systemically investigated the issues with structural MRIs of 1,868 patients with major depressive disorder and healthy controls from the international Predictive Analytic Competition [Reference Flint, Cearns, Opel, Redlich, Mehler and Emden23]. They mimicked the process by which researchers would draw samples of various sizes (N = 4–150) and concluded that a strong risk of misestimation and an accuracy of up to 95% can be observed with sample sizes of 20, mainly due to accuracy overestimation during cross-validation. They recommended using sufficiently large test sets to offset the performance misestimation.
Studies have rarely explored sex- and age-specific machine learning models. One diffusion spectrum imaging study that used a diagnostic index based on whole-brain patterns of altered white matter tract integrity did separate models by sex [Reference Chen, Liu, Hsu, Lo, Hwang and Hwu42]. The overall prediction accuracy was approximately 84% for men, 82% for women, and 76% for men and women together. The results implied that sex has a significant effect on structural connectivity patterns, and it may be helpful to establish different models for male and female participants to improve prediction performance. In our study, sex-specific SVMs performed worse than those based on both sexes. By contrast, the older adult-specific SVMs had slightly better performance than the SVMs based on all ages, but with poor generalization to younger participants. Therefore, it may be practical to establish SVMs based on participants covering various demographic properties in the clinical setting.
We noted that a higher sample size provided better performance and improved the reliability of the SVMs by decreasing performance variance. Our findings are consistent with a previous simulation study suggesting that a larger sample size may improve model stability [Reference Varoquaux20]. Several studies have also explored the relationship between training sample sizes and classification accuracy, but the results have exhibited some disagreement. A study trained SVMs based on structural MRI features and demonstrated a consistent increase of classification accuracy to approximately 70% with increases in sample size (N = 10, 20, 30, …, 220), and the accuracy appeared not to have reached its maximum. Another resting fMRI study used intersubject correlation in functional connectome as to classify patients with schizophrenia and reported higher performance associated with larger training samples [Reference Cui, Liu, Wang, Wang, Guo and Xi43]. By contrast, one study evaluated SVMs based on structural MRI to classify patients with major depressive disorder with variable training set size N = 5–150 and reported no performance improvement for N > 30. Thus, the relationship between classification performance and training sample sizes may depend on the features (structural or functional) and complexities of algorithms, and a higher training sample may not generally lead to better performance. Our findings indicate that the performance continued to improve at N = 400; we therefore suggest increasing the sample size of the dataset even further with the current models.
The choice of brain parcellations has been rather arbitrary in previous machine learning studies using the brain-wide FC as features. AAL-3 is a recently announced brain parcellation [Reference Friston, Williams, Howard, Frackowiak and Turner33]. Compared with AAL-2, AAL-3 has 26 new regions, a new subdivision of the thalamus into 15 parts, and subdivision of the anterior cingulate cortex into subgenual, pregenual, and supracallosal parts. Given the critical role of the thalamocortical FC in schizophrenic disorder [Reference Tu, Lee, Chen, Hsu, Li and Su6,Reference Anticevic, Haut, Murray, Repovs, Yang and Diehl7,Reference Chen, Liu, Hsu, Lo, Hwang and Hwu44], finer parcellations of the thalamus in AAL-3 may have contributed to its higher performance in our study. Nevertheless, the SVMs based on the three parcellations all had high accuracy, thus supporting the reliability of the models.
Our study had several limitations. First, all our patient groups received treatment with various antipsychotics, so the performance of our models on drug-naïve or first-present patients remains unclear. While diagnostic interviews had the most critical values in first-presentation patients, the factor may limit their clinical applications. Secondly, our machine learning models adopted the features of brain-wise FCs and was limited to only one modality. Previous studies suggested that multi-modal techniques may provide superior performance [Reference Lei, Pinaya, Young, van Amelsvoort, Marcelis and Donohoe24,Reference Ji, Chen, Bai, Wang, Wei and Gao45], and it should be explored about the performance of SVMs using multi-modal features in the large single site dataset. Finally, our dataset was limited to a single site, precluding the cross-site generalization of our models. Models based on single-site datasets have a much lower performance in cross-site generalization [Reference Woodward, Karbasforoushan and Heckers46,Reference Lin, Li, Dong, Wang, Sun and Shi47], likely due to various confounding factors such as different MRI machines, acquisition parameters, and diagnostic processes. Future studies should explore the performance of SVMs using multimodal features in large single-site and multi-site datasets.
In this study, SVMs trained on brain-wide FC retrieved from a large single-site dataset of patients with schizophrenia and healthy controls provided a classification accuracy of 85.05%. The results provided support for the diagnostic values of brain-wise FCs in patients with schizophrenia with the largest single site sample size to date. The feature importance analysis found that the thalamo-cerebellar FC played the most important role in differentiating patients from controls and might serve as potential neural biomarker for patients with schizophrenia. AAL-3 was recommended for brain-wise FC constructions. The use of more homogenous participants with the same sex or age range did not provide better performance and establishing SVMs with a large sample size with heterogeneous properties is a recommend for their applications in single subject prediction of patients with schizophrenia.
To view supplementary material for this article, please visit http://doi.org/10.1192/j.eurpsy.2021.2248.
We gratefully thank all the participants who took part in this research and all the research assistants and staff who facilitated their involvement.
Data Availability Statement
The data that support the findings of this study are available from the authors.
Restrictions in relation to potentially person identifiable information apply.
The study was supported by grants from Taipei Veterans General Hospital (V99C1-040, V101C1-159, V104C-039, V105C-119, V106C-091, and V107C-100), Taiwan Ministry of Science and Technology (NSC 99-2628-B-010-021-MY2, MOST 105-2314-B-075-056-MY2, MOST 103-2314-B-075-065-MY2, and MOST 109-2314-B-075-062) and the Ministry of Science and Technology, Taiwan under the grant MOST 108-2218-E-008-017-MY3 and MOST 108-2634-F-008-003- through Pervasive Artificial Intelligence Research (PAIR) Labs, Taiwan.
Conceptualization: L.-H.L. and P.-C.T.; Formal analysis: L.-H.L., C.-H.C., W.-C.C., and P.-C.T.; Funding acquisition: L.-H.L., P.-L.L., K.-K.S., J.-W.H., Y.-M.B., and P.-C.T.; Investigation: L.-H.L., C.-H.C., W-C.C., M.-H.C., J.-W.H., Y.-M.B., T.-P.S., and P.-C.T.; Methodology: L.-H.L., C.-H.C., W.-C.C., M.-H.C., and P.-C.T.; Supervision: L.-H.L., P.-L.L., K.-K.S., Y.-M.B., and T.-P.S.; Validation: L.-H.L., C.-H.C., W.-C.C., and P.-C.T.; Project administration: C.-H.C., W.-C.C., and P.-C.T.; Resources: C.-H.C., W.-C.C., P.-L.L., M.-H.C., Y.-M.B., and P.-C.T.; Software: C.-H.C. and W.-C.C.; Data curation: C.-H.C., K.-K.S., J.-W.H., Y.-M.B., and P.-C.T.; Visualization: W.-C.C. and T.-P.S.; Writing – original draft: L.-H.L. and P.-C.T.; Writing – review & editing: L.-H.L., W.-C.C., and P.-C.T.
Conflicts of Interest
The authors declare that they have no conflict of interest.