Skip to main content Accessibility help
Hostname: page-component-684899dbb8-bjz6k Total loading time: 0.858 Render date: 2022-05-28T16:50:15.897Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "useNewApi": true }

Chapter 4 - Critical Appraisal of Studies of Diagnostic Test Accuracy

Published online by Cambridge University Press:  02 May 2020

Thomas B. Newman
University of California, San Francisco
Michael A. Kohn
University of California, San Francisco
Get access


We have learned how to quantify the accuracy of dichotomous (Chapter 2) and multilevel (Chapter 3) tests. In this chapter, we turn to critical appraisal of studies of diagnostic test accuracy, with an emphasis on problems with study design that affect the interpretation or credibility of the results. After a general discussion of an approach to studies of diagnostic tests, we will review some common biases to which studies of test accuracy are uniquely or especially susceptible and conclude with an introduction to systematic reviews of test accuracy studies.

Evidence-Based Diagnosis
An Introduction to Clinical Epidemiology
, pp. 75 - 109
Publisher: Cambridge University Press
Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Hulley, SB, Cummings, SR, Browner, WS, Grady, DG, Newman, TB. Designing clinical research. 4th ed. Philadelphia: Wolters Kluwer/Lippincott Williams & Wilkins; 2013.Google Scholar
Felker, GM, Anstrom, KJ, Adams, KF, et al. Effect of natriuretic peptide-guided therapy on hospitalization or cardiovascular mortality in high-risk patients with heart failure and reduced ejection fraction: a randomized clinical trial. JAMA. 2017;318(8):713–20.CrossRefGoogle ScholarPubMed
Ehteshami Bejnordi, B, Veta, M, Johannes van, Diest P, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318(22):2199–210.CrossRefGoogle ScholarPubMed
Gulshan, V, Peng, L, Coram, M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402–10.CrossRefGoogle ScholarPubMed
Esteva, A, Kuprel, B, Novoa, RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–18.CrossRefGoogle ScholarPubMed
Kohn, MA, Carpenter, CR, Newman, TB. Understanding the direction of bias in studies of diagnostic test accuracy. Acad Emerg Med. 2013;20(11):1194–206.CrossRefGoogle ScholarPubMed
Rompianesi, G, Hann, A, Komolafe, O, et al. Serum amylase and lipase and urinary trypsinogen and amylase for diagnosis of acute pancreatitis. Cochrane Database Syst Rev. 2017;4:CD012010.Google ScholarPubMed
Banks, PA, Bollen, TL, Dervenis, C, et al. Classification of acute pancreatitis – 2012: revision of the Atlanta classification and definitions by international consensus. Gut. 2013;62(1):102–11.CrossRefGoogle ScholarPubMed
Maisel, AS, Krishnaswamy, P, Nowak, RM, et al. Rapid measurement of B-type natriuretic peptide in the emergency diagnosis of heart failure. N Engl J Med. 2002;347(3):161–7.CrossRefGoogle ScholarPubMed
Lau, J, Ioannidis, JP, Balk, EM, et al. Diagnosing acute cardiac ischemia in the emergency department: a systematic review of the accuracy and clinical effect of current technologies. Ann Emerg Med. 2001;37(5):453–60.CrossRefGoogle ScholarPubMed
Moyer, VA, Ahn, C, Sneed, S. Accuracy of clinical judgment in neonatal jaundice. Arch Pediatr Adolesc Med. 2000;154(4):391–4.CrossRefGoogle ScholarPubMed
Pearl, RH, Hale, DA, Molloy, M, Schutt, DC, Jaques, DP. Pediatric appendectomy. J Pediatr Surg. 1995;30(2):173–8; discussion 8–81.CrossRefGoogle ScholarPubMed
Bundy, DG, Byerley, JS, Liles, EA, et al. Does this child have appendicitis? JAMA. 2007;298(4):438–51.CrossRefGoogle ScholarPubMed
Eshed, I, Gorenstein, A, Serour, F, Witzling, M. Intussusception in children: can we rely on screening sonography performed by junior residents? Pediatr Radiol. 2004;34(2):134–7.CrossRefGoogle ScholarPubMed
Limmathurotsakul, D, Turner, EL, Wuthiekanun, V, et al. Fool’s gold: why imperfect reference tests are undermining the evaluation of novel diagnostics: a reevaluation of 5 diagnostic tests for leptospirosis. Clin Infect Dis. 2012;55(3):322–31.CrossRefGoogle ScholarPubMed
Valenstein, PN. Evaluating diagnostic tests with imperfect standards. Am J Clin Pathol. 1990;93(2):252–8.CrossRefGoogle ScholarPubMed
van Smeden, M, Naaktgeboren, CA, Reitsma, JB, Moons, KG, de Groot, JA. Latent class models in diagnostic studies when there is no reference standard – a systematic review. Am J Epidemiol. 2014;179(4):423–31.CrossRefGoogle ScholarPubMed
Koch, C, Chauve, E, Chaudru, S, et al. Exercise transcutaneous oxygen pressure measurement has good sensitivity and specificity to detect lower extremity arterial stenosis assessed by computed tomography angiography. Medicine (Baltimore). 2016;95(36):e4522.CrossRefGoogle ScholarPubMed
Cicero, S, Rembouskos, G, Vandecruys, H, Hogg, M, Nicolaides, KH. Likelihood ratio for trisomy 21 in fetuses with absent nasal bone at the 11–14-week scan. Ultrasound Obstet Gynecol. 2004;23(3):218–23.CrossRefGoogle ScholarPubMed
Collaborators GBDMM. Global, regional, and national levels of maternal mortality, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet. 2016;388(10053):1775–812.Google Scholar
Farahmand, S, Farnia, M, Shahriaran, S, Khashayar, P. The accuracy of limited B-mode compression technique in diagnosing deep venous thrombosis in lower extremities. Am J Emerg Med. 2011;29(6):687–90.CrossRefGoogle Scholar
Jang, T, Docherty, M, Aubin, C, Polites, G. Resident-performed compression ultrasonography for the detection of proximal deep vein thrombosis: fast and accurate. Acad Emerg Med. 2004;11(3):319–22.CrossRefGoogle ScholarPubMed
Kline, JA, O’Malley, PM, Tayal, VS, Snead, GR, Mitchell, AM. Emergency clinician-performed compression ultrasonography for deep venous thrombosis of the lower extremity. Ann Emerg Med. 2008;52(4):437–45.CrossRefGoogle ScholarPubMed
Sostman, HD, Stein, PD, Gottschalk, A, et al. Acute pulmonary embolism: sensitivity and specificity of ventilation-perfusion scintigraphy in PIOPED II study. Radiology. 2008;246(3):941–6.CrossRefGoogle ScholarPubMed
Stein, PD, Fowler, SE, Goodman, LR, et al. Multidetector computed tomography for acute pulmonary embolism. N Engl J Med. 2006;354(22):2317–27.CrossRefGoogle ScholarPubMed
Schuetz, GM, Schlattmann, P, Dewey, M. Use of 3x2 tables with an intention to diagnose approach to assess clinical performance of diagnostic tests: meta-analytical evaluation of coronary CT angiography studies. BMJ. 2012;345:e6717.CrossRefGoogle ScholarPubMed
Simel, DL, Feussner, JR, DeLong, ER, Matchar, DB. Intermediate, indeterminate, and uninterpretable diagnostic test results. Med Decis Making. 1987;7(2):107–14.CrossRefGoogle ScholarPubMed
Littenberg, B, Moses, LE. Estimating diagnostic accuracy from multiple conflicting reports: a new meta-analytic method. Med Decis Making. 1993;13(4):313–21.CrossRefGoogle ScholarPubMed
Macaskill, P. Empirical Bayes estimates generated in a hierarchical summary ROC analysis agreed closely with those of a full Bayesian analysis. J Clin Epidemiol. 2004;57(9):925–32.CrossRefGoogle Scholar
Downar, J, Goldman, R, Pinto, R, Englesakis, M, Adhikari, NK. The “surprise question” for predicting death in seriously ill patients: a systematic review and meta-analysis. CMAJ. 2017;189(13):E484–E93.CrossRefGoogle ScholarPubMed
Whiting, P, Harbord, R, Main, C, et al. Accuracy of magnetic resonance imaging for the diagnosis of multiple sclerosis: systematic review. BMJ. 2006;332(7546):875–84.CrossRefGoogle Scholar
Stewart, LA, Clarke, M, Rovers, M, et al. Preferred reporting items for systematic review and meta-analyses of individual participant data: the PRISMA-IPD Statement. JAMA. 2015;313(16):1657–65.CrossRefGoogle ScholarPubMed
Kohn, MA, Klok, FA, van Es, N. D-dimer interval likelihood ratios for pulmonary embolism. Acad Emerg Med. 2017;24(7):832–7.CrossRefGoogle ScholarPubMed
Guyatt, G, Rennie, D, Evidence-Based Medicine Working Group, American Medical Association. Users’ guides to the medical literature: a manual for evidence-based clinical practice. Chicago: AMA Press; 2002. xxiii, 706pp.Google Scholar
Straus, S, Richardson, W, Glasziou, P, Haynes, RB. Evidence-based medicine: how to practice and teach EBM. New York: Elsevier/Churchill Livingstone; 2005.Google Scholar
Whiting, P, Rutjes, AW, Reitsma, JB, Bossuyt, PM, Kleijnen, J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3:25.CrossRefGoogle ScholarPubMed
Whiting, PF, Rutjes, AW, Westwood, ME, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. 2011;155(8):529–36.CrossRefGoogle ScholarPubMed
Lau, J, Ioannidis, JP, Balk, EM, et al. Diagnosing acute cardiac ischemia in the emergency department: a systematic review of the accuracy and clinical effect of current technologies. Ann Emerg Med. 2001;37(5):453–60.CrossRefGoogle ScholarPubMed
Lau, J, Ioannidis, JP, United States. Agency for Healthcare Research and Quality. New England Medical Center Hospital. Evidence-based Practice Center. Evaluation of technologies for identifying acute cardiac ischemia in emergency departments. Rockville, MD: The Agency; 2001. ix, 315pp.Google Scholar
Appelboam, A, Reuben, AD, Benger, JR, et al. Elbow extension test to rule out elbow fracture: multicentre, prospective validation and observational study of diagnostic accuracy in adults and children. BMJ. 2008;337:a2428.CrossRefGoogle ScholarPubMed
Amarilyo, G, Alper, A, Ben-Tov, A, Grisaru-Soen, G. Diagnostic accuracy of clinical symptoms and signs in children with meningitis. Pediatr Emerg Care. 2011;27(3):196–9.CrossRefGoogle ScholarPubMed
Mehta, SH, Lau, B, Afdhal, NH, Thomas, DL. Exceeding the limits of liver histology markers. J Hepatol. 2009;50(1):3641.CrossRefGoogle ScholarPubMed
Ashdown, HF, D’Souza, N, Karim, D, et al. Pain over speed bumps in diagnosis of acute appendicitis: diagnostic accuracy study. BMJ. 2012;345:e8012.CrossRefGoogle ScholarPubMed
Vestergaard, ME, Macaskill, P, Holt, PE, Menzies, SW. Dermoscopy compared with naked eye examination for the diagnosis of primary melanoma: a meta-analysis of studies performed in a clinical setting. Br J Dermatol. 2008;159(3):669–76.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the or variations. ‘’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats