Skip to main content Accessibility help

Equivalency of the diagnostic accuracy of the PHQ-8 and PHQ-9: a systematic review and individual participant data meta-analysis

  • Yin Wu (a1) (a2) (a3), Brooke Levis (a1) (a3), Kira E. Riehm (a1), Nazanin Saadat (a1), Alexander W. Levis (a1), Marleine Azar (a1), Danielle B. Rice (a1) (a4), Jill Boruff (a5), Pim Cuijpers (a6), Simon Gilbody (a7), John P.A. Ioannidis (a8), Lorie A. Kloda (a9), Dean McMillan (a7), Scott B. Patten (a10) (a11), Ian Shrier (a1) (a3), Roy C. Ziegelstein (a12), Dickens H. Akena (a13), Bruce Arroll (a14), Liat Ayalon (a15), Hamid R. Baradaran (a16) (a17), Murray Baron (a1) (a18), Charles H. Bombardier (a19), Peter Butterworth (a20) (a21), Gregory Carter (a22), Marcos H. Chagas (a23), Juliana C. N. Chan (a24) (a25) (a26), Rushina Cholera (a27), Yeates Conwell (a28), Janneke M. de Man-van Ginkel (a29), Jesse R. Fann (a30), Felix H. Fischer (a31), Daniel Fung (a32) (a33) (a34) (a35), Bizu Gelaye (a36), Felicity Goodyear-Smith (a14), Catherine G. Greeno (a37), Brian J. Hall (a38) (a39), Patricia A. Harrison (a40), Martin Härter (a41), Ulrich Hegerl (a42), Leanne Hides (a43), Stevan E. Hobfoll (a44), Marie Hudson (a1) (a18), Thomas Hyphantis (a45), Masatoshi Inagaki (a46), Nathalie Jetté (a10) (a11) (a47), Mohammad E. Khamseh (a16), Kim M. Kiely (a48) (a49), Yunxin Kwan (a50), Femke Lamers (a51), Shen-Ing Liu (a35) (a52) (a53) (a54), Manote Lotrakul (a55), Sonia R. Loureiro (a23), Bernd Löwe (a56), Anthony McGuire (a57), Sherina Mohd-Sidik (a58), Tiago N. Munhoz (a59), Kumiko Muramatsu (a60), Flávia L. Osório (a23) (a61), Vikram Patel (a62) (a63), Brian W. Pence (a64), Philippe Persoons (a65) (a66), Angelo Picardi (a67), Katrin Reuter (a68), Alasdair G. Rooney (a69), Iná S. Santos (a59), Juwita Shaaban (a70), Abbey Sidebottom (a71), Adam Simning (a28), Lesley Stafford (a72) (a73), Sharon Sung (a32) (a35), Pei Lin Lynnette Tan (a50), Alyna Turner (a74) (a75), Henk C. van Weert (a76), Jennifer White (a77), Mary A. Whooley (a78) (a79) (a80), Kirsty Winkley (a81), Mitsuhiko Yamada (a82), Andrea Benedetti (a3) (a18) (a83) and Brett D. Thombs (a1) (a2) (a3) (a4) (a18) (a84)...
  • Please note a correction has been issued for this article.



Item 9 of the Patient Health Questionnaire-9 (PHQ-9) queries about thoughts of death and self-harm, but not suicidality. Although it is sometimes used to assess suicide risk, most positive responses are not associated with suicidality. The PHQ-8, which omits Item 9, is thus increasingly used in research. We assessed equivalency of total score correlations and the diagnostic accuracy to detect major depression of the PHQ-8 and PHQ-9.


We conducted an individual patient data meta-analysis. We fit bivariate random-effects models to assess diagnostic accuracy.


16 742 participants (2097 major depression cases) from 54 studies were included. The correlation between PHQ-8 and PHQ-9 scores was 0.996 (95% confidence interval 0.996 to 0.996). The standard cutoff score of 10 for the PHQ-9 maximized sensitivity + specificity for the PHQ-8 among studies that used a semi-structured diagnostic interview reference standard (N = 27). At cutoff 10, the PHQ-8 was less sensitive by 0.02 (−0.06 to 0.00) and more specific by 0.01 (0.00 to 0.01) among those studies (N = 27), with similar results for studies that used other types of interviews (N = 27). For all 54 primary studies combined, across all cutoffs, the PHQ-8 was less sensitive than the PHQ-9 by 0.00 to 0.05 (0.03 at cutoff 10), and specificity was within 0.01 for all cutoffs (0.00 to 0.01).


PHQ-8 and PHQ-9 total scores were similar. Sensitivity may be minimally reduced with the PHQ-8, but specificity is similar.


Corresponding author

Authors for correspondence: Andrea Benedetti, E-mail: and Brett D. Thombs, E-mail:


Hide All
Allaby, M (2010) Screening for Depression: A Report for the UK National Screening Committee (Revised report). London, United Kingdom: UK National Screening Committee.
American Psychiatric Association (2013) Diagnostic and Statistical Manual of Mental Disorders, 5th Edn. Arlington, VA: American Psychiatric Publishing.
Barrera, TL, Cummings, JP, Armento, M, Cully, JA, Bush Amspoker, A, Wilson, NL, Mallen, MJ, Shrestha, S, Kunik, ME and Stanley, MA (2017) Telephone-delivered cognitive-behavioral therapy for older, rural veterans with depression and anxiety in home-based primary care. Clinical Gerontologist 40, 114123.
Brugha, TS, Bebbington, PE and Jenkins, R (1999) A difference that matters: comparisons of structured and semi-structured psychiatric diagnostic interviews in the general population. Psychological Medicine 29, 10131020.
Canadian Agency for Drugs and Technologies in Health (2016) PRESS – Peer Review of Electronic Search Strategies: 2015 Guideline Explanation and Elaboration (PRESS E&E). Ottawa: CADTH.
Carter, G and Spittal, MJ (2018) Suicide risk assessment: risk stratification is not accurate enough to be clinically useful and alternative approaches are needed. Crisis 39, 229.
Corson, K, Gerrity, MS and Dobscha, SK (2004) Screening for depression and suicidality in a VA primary care setting: 2 items are better than 1 item. The American Journal of Managed Care 10, 839845.
Crawford, MJ, Thana, L, Methuen, C, Ghosh, P, Stanley, SV, Ross, J, Gordon, F, Blair, G and Bajaj, P (2011) Impact of screening for risk of suicide: randomised controlled trial. British Journal of Psychiatry 198, 379384.
Fagerland, MW, Lydersen, S and Laake, P (2014) Recommended tests and confidence intervals for paired binomial proportions. Statistics in Medicine 33, 28502875.
Higgins, JP and Thompson, SG (2002) Quantifying heterogeneity in a meta-analysis. Statistics in Medicine 21, 15391558.
Ishihara, M, Harel, D, Levis, B, Levis, AW, Riehm, KE, Saadat, N, Azar, M, Rice, DB, Sanchez, TA, Chiovitti, MJ and Cuijpers, P (2019). Shortening self-report mental health symptom measures through optimal test assembly methods: development and validation of the Patient Health Questionnaire-Depression-4. Depression and Anxiety 36, 8292. doi: 10.1002/da.22841.
Kroenke, K, Spitzer, RL and Williams, JBW (2001) The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine 16, 606613.
Kroenke, K, Strine, TW, Spitzer, RL, Williams, JB, Berry, JT and Mokdad, AH (2009) The PHQ-8 as a measure of current depression in the general population. Journal of Affective Disorders 114, 163173.
Levis, B, Benedetti, A, Riehm, KE, Saadat, N, Levis, AW, Azar, M, Rice, DB, Chiovitti, MJ, Sanchez, TA, Cuijpers, P and Gilbody, S (2018) Probability of major depression diagnostic classification using semi-structured versus fully structured diagnostic interviews. British Journal of Psychiatry 212, 377385.
Levis, B, Benedetti, A, Thombs, BD and on behalf of the DEPRESsion Screening Data (DEPRESSD) Collaboration (2019) Accuracy of Patient Health Questionnaire-9 (PHQ-9) for screening to detect major depression: individual participant data meta-analysis. BMJ 365, l1476.
Moriarty, AS, Gilbody, S, McMillan, D and Manea, L (2015) Screening and case finding for major depressive disorder using the Patient Health Questionnaire (PHQ-9): a meta-analysis. General Hospital Psychiatry 37, 567576.
Nosen, E and Woody, SR (2008) Chapter 8: diagnostic assessment in research. In McKay, D (ed.), Handbook of Research Methods in Abnormal and Clinical Psychology. Thousand Oaks: Sage, pp. 109124.
Razykov, I, Ziegelstein, R, Whooley, M and Thombs, BD (2012) The PHQ-9 versus the PHQ-8 – is item 9 useful for assessing suicide risk in coronary artery disease patients? Data from the heart and Soul study. Journal of Psychosomatic Research 73, 163168.
Rice, DB, Kloda, LA, Levis, B, Qi, B, Kingsland, E and Thombs, BD (2016) Are MEDLINE searches sufficient for systematic reviews and meta-analyses of the diagnostic accuracy of depression screening tools? A review of meta-analyses. Journal of Psychosomatic Research 87, 713.
Riley, RD, Dodd, SR, Craig, JV, Thompson, JR and Williamson, PR (2008) Meta-analysis of diagnostic test studies using individual patient data and aggregate data. Statistics in Medicine 27, 61116136.
Robins, LN, Wing, J, Wittchen, HU, Helzer, JE, Babor, TF, Burke, J, Farmer, A, Jablenski, A, Pickens, R, Regier, DA and Sartorius, N (1988) The Composite International Diagnostic Interview: an epidemiologic instrument suitable for use in conjunction with different diagnostic systems and in different cultures. Archives of General Psychiatry 45, 10691077.
Sheehan, DV, Lecrubier, Y, Sheehan, KH, Janavs, J, Weiller, E, Keskiner, A, Schinka, J, Knapp, E, Sheehan, MF and Dunbar, GC (1997) The validity of the Mini International Neuropsychiatric Interview (MINI) according to the SCID-P and its reliability. European Psychiatry 12, 232241.
Simon, GE, Rutter, CM, Peterson, D, Oliver, M, Whiteside, U, Operskalski, B and Ludman, EJ (2013) Does response on the PHQ-9 Depression Questionnaire predict subsequent suicide attempt or suicide death? Psychiatric Services 64, 11951202.
Simon, GE, Coleman, KJ, Rossom, RC, Beck, A, Oliver, M, Johnson, E, Whiteside, U, Operskalski, B, Penfold, RB, Shortreed, SM and Rutter, C (2016) Risk of suicide attempt and suicide death following completion of the Patient Health Questionnaire depression module in community practice. The Journal of Clinical Psychiatry 77, 221227.
Siu, AL and the US Preventive Services Task Force (USPSTF) (2016) Screening for depression in adults: US preventive services task force recommendation statement. JAMA: The Journal of the American Medical Association 315, 380387.
Suarez, L, Beach, SR, Moore, SV, Mastromauro, CA, Januzzi, JL, Celano, CM, Chang, TE and Huffman, JC (2015) Use of the Patient Health Questionnaire-9 and a detailed suicide evaluation in determining imminent suicidality in distressed patients with cardiac disease. Psychosomatics 56, 181189.
Thombs, BD, Benedetti, A, Kloda, LA, Levis, B, Nicolau, I, Cuijpers, P, Gilbody, S, Ioannidis, JP, McMillan, D, Patten, SB and Shrier, I (2014) The diagnostic accuracy of the Patient Health Questionnaire-2 (PHQ-2), Patient Health Questionnaire-8 (PHQ-8), and Patient Health Questionnaire-9 (PHQ-9) for detecting major depression: protocol for a systematic review and individual patient data meta-analyses. Systematic Reviews 3, 124.
van der Leeden, R, Busing, FMTA and Meijer, E (1997) Bootstrap Methods For Two-Level Models (Technical report PRM 97-04). Leiden, The Netherlands: Leiden University, Department of Psychology.
van der Leeden, R, Meijer, E and Busing, FMTA (2008) Chapter 11: resampling multilevel models. In Leeuw, J and Meijer, E (eds), Handbook of Multilevel Analysis. New York: Springer, pp. 401433.
Walker, E and Nowacki, AS (2011) Understanding equivalence and noninferiority testing. Journal of General Internal Medicine 26, 192196.
Walker, J, Hansen, CH, Butcher, I, Sharma, N, Wall, L, Murray, G and Sharpe, M (2011) Thoughts of death and suicide reported by cancer patients who endorsed the “suicidal thoughts” item of the PHQ-9 during routine screening for depression. Psychosomatics 52, 424427.
Wells, TS, Horton, JL, LeardMann, CA, Jacobson, IG and Boyko, EJ (2013) A comparison of the PRIME-MD PHQ-9 and PHQ-8 in a large military prospective study, the Millennium Cohort Study. Journal of Affective Disorders 148, 7783.
Whiting, PF, Rutjes, AW, Westwood, ME, Mallett, S, Deeks, JJ, Reitsma, JB, Leeflang, MM, Sterne, JA and Bossuyt, PM (2011) QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine 155, 529536.


Type Description Title
Supplementary materials

Wu et al. supplementary material
Wu et al. supplementary material 1

 Word (92.6 MB)
92.6 MB


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed

A correction has been issued for this article: