Skip to main content Accessibility help

Benchmarking Routine Psychological Services: A Discussion of Challenges and Methods

  • Jaime Delgadillo (a1), Dean McMillan (a2), Chris Leach (a3), Mike Lucock (a3), Simon Gilbody (a2) and Nick Wood (a4)...


Background: Policy developments in recent years have led to important changes in the level of access to evidence-based psychological treatments. Several methods have been used to investigate the effectiveness of these treatments in routine care, with different approaches to outcome definition and data analysis. Aims: To present a review of challenges and methods for the evaluation of evidence-based treatments delivered in routine mental healthcare. This is followed by a case example of a benchmarking method applied in primary care. Method: High, average and poor performance benchmarks were calculated through a meta-analysis of published data from services working under the Improving Access to Psychological Therapies (IAPT) Programme in England. Pre-post treatment effect sizes (ES) and confidence intervals were estimated to illustrate a benchmarking method enabling services to evaluate routine clinical outcomes. Results: High, average and poor performance ES for routine IAPT services were estimated to be 0.91, 0.73 and 0.46 for depression (using PHQ-9) and 1.02, 0.78 and 0.52 for anxiety (using GAD-7). Data from one specific IAPT service exemplify how to evaluate and contextualize routine clinical performance against these benchmarks. Conclusions: The main contribution of this report is to summarize key recommendations for the selection of an adequate set of psychometric measures, the operational definition of outcomes, and the statistical evaluation of clinical performance. A benchmarking method is also presented, which may enable a robust evaluation of clinical performance against national benchmarks. Some limitations concerned significant heterogeneity among data sources, and wide variations in ES and data completeness.


Corresponding author

Reprint requests to Jaime Delgadillo, Leeds Community Healthcare NHS Trust - Primary Care Mental Health, The Reginald Centre, Second Floor, 263 Chapeltown Road, Leeds LS7 3EX, UK. E-mail:


Hide All
Blais, M. A., Sinclair, S. J., Baity, M. R., Worth, J., Weiss, A. P., Ball, L. A., et al. (2012). Measuring outcomes in adult outpatient psychiatry. Clinical Psychology and Psychotherapy, 19, 203213.
Chambless, D. L., Baker, M. J., Baucom, D. H., Beutler, L. E., Calhoun, K. S., Crits-Christoph, P., et al. (1998). Update on empirically validated therapies, II. The Clinical Psychologist, 51, 316.
Chambless, D. L. and Ollendick, T. H. (2001). Empirically supported psychological interventions: controversies and evidence. Annual Review of Psychology, 52, 685716.
Clark, D. M., Layard, R., Smithies, R., Richards, D. A., Suckling, R. and Wright, B. (2009). Improving access to psychological therapy: initial evaluation of two UK demonstration sites. Behaviour Research and Therapy, 47, 910920.
Cochran, W. G. (1954). The combination of estimates from different experiments. Biometrics, 10, 101129.
Cohen, J. (1998). Statistical Power Analysis for the Behavioural Sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Das-Munshi, J., Goldberg, D., Bebbington, P. E., Bhugra, D. K., Brugha, T. S. and Dewey, M. E. (2008). Public health significance of mixed anxiety and depression: beyond current classification. British Journal of Psychiatry, 192, 171177.
Dowrick, C., Leydon, G. M., McBride, A., Howe, A., Burgess, H., Clarke, P., et al. (2009). Patients’ and doctors’ views on depression severity questionnaires incentivised in UK quality and outcomes framework: qualitative study. British Medical Journal, 338, b663.
Evans, C., Margison, F. and Barkham, M. (1998). The contribution of reliable and clinically significant change methods to evidence-based mental health. Evidence-based Mental Health, 1, 7072.
Franklin, M. E. and DeRubeis, R. J. (2006). Are efficacious laboratory-validated treatments readily transportable to clinical practice? In Norcross, J. C., Beutler, L. E. and Levant, R. F. (Eds.), Evidence-Based Practices in Mental Health: debate and dialogue on fundamental questions (pp. 375383). Washington DC: American Psychological Association.
Gilbody, S., Richards, D. and Barkham, M. (2007). Diagnosing depression in primary care using self-completed instruments: UK validation of PHQ–9 and CORE–OM. British Journal of General Practice, 57, 650652.
Glover, G., Webb, M. and Evison, F. (2010). Improving Access to Psychological Therapies: a review of the progress made by sites in the first rollout year. Stockton on Tees: North East Public Health Observatory.
Gyani, A., Shafran, R., Layard, R. and Clark, D. M. (2011). Enhancing Recovery Rates in IAPT Services: lessons from analysis of the Year One data. London: University of Reading, London School of Economics and Kings College London.
Higgins, J. P. T., Thompson, S. G., Deeks, J. J. and Altman, D. G. (2003). Measuring inconsistency in meta-analyses. British Medical Journal, 327, 557560.
IAPT National Programme Team (2011). The IAPT Data Handbook: guidance on recording and monitoring outcomes to support local evidence-based practice. Version 2.0. London: National IAPT Programme Team.
Jacobson, N. S. and Truax, P. (1991). Clinical significance: a statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 1219.
Kessler, R. C., Berglund, P., Demler, O., Jin, R., Koretz, D., Merikangas, K. R., et al. (2003). The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). Journal of the American Medical Association, 289, 30953105.
Kroenke, K., Spitzer, R. L. and Williams, J. B. W. (2001). The PHQ-9: validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606613.
Kroenke, K., Spitzer, R. L., Williams, J. B. W., Monahan, P. O. and Löwe, B. (2007). Anxiety disorders in primary care: prevalence, impairment, comorbidity, and detection. Annals of Internal Medicine, 146, 317325.
Kroenke, K., Spitzer, R. L., Williams, J. B. W. and Löwe, B. (2010). The Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptom Scales: a systematic review. General Hospital Psychiatry, 32, 345359.
Larsen, R. J. and Fredrickson, B. L. (1999). Measurement issues in emotion research. In Kahneman, D., Diener, E. and Shwarz, N. (Eds.), Well-being: the foundations of hedonic psychology (pp.4060). New York: Russell Sage Foundation.
Lueger, R. J. and Barkham, M. (2010). Using benchmarks and benchmarking to improve quality of practice and services. In Barkham, M., Hardy, G. E. and Mellor-Clark, J. (Eds.), Developing and Delivering Practice-Based Evidence. Chichester: Wiley.
McAleavey, A. A., Nordberg, S. S., Kraus, D. and Castonguay, L. G. (2012). Errors in treatment outcome monitoring: implications for real-world psychotherapy. Canadian Psychology, 53, 105114.
McCaffrey, R. J. and Westervelt, H. J. (1995). Issues associated with repeated neuropsychological assessments. Neuropsychology Review, 5, 203221.
McManus, S., Meltzer, H., Brugha, T., Bebbington, P. and Jenkins, R. (2009). Adult Psychiatric Morbidity in England, 2007: results of a household survey. Retrieved September 7, 2010 from
McMillan, D., Richards, D. and Gilbody, S. (2010). Defining successful treatment outcome in depression using the PHQ-9: a comparison of methods. Journal of Affective Disorders, 127, 122129.
Minami, T., Wampold, B. E., Serlin, R. C., Kircher, J. C. and Brown, G. S. (2007). Benchmarks for psychotherapy efficacy in adult major depression. Journal of Consulting and Clinical Psychology, 75, 232243.
Minami, T., Serlin, R. C., Wampold, B. E., Kircher, J. C. and Brown, G. S. (2008). Using clinical trials to benchmark effects produced in clinical practice. Quality and Quantity, 42, 513525.
Minami, T., Wampold, B. E., Serlin, R. C., Hamilton, E. G., Brown, G. S. and Kircher, J. C. (2008). Benchmarking the effectiveness of psychotherapy treatment for adult depression in a managed care environment: a preliminary study. Journal of Consulting and Clinical Psychology, 76, 116124.
National Institute for Health and Clinical Excellence (2007a). Anxiety (amended): management of anxiety (panic disorder, with or without agoraphobia, and generalized anxiety disorder) in adults in primary, secondary and community care. London: NICE.
National Institute for Health and Clinical Excellence (2007b). Depression (amended): management of depression in primary and secondary care. London: NICE.
National Institute for Health and Clinical Excellence (2011). Common Mental Health Disorders: identification and pathways to care. London: National Collaborating Centre for Mental Health.
National Screening Committee (2003). The UK National Screening Committee's Criteria for Appraising the Viability, Effectiveness and Appropriateness of a Screening Programme. London: HMSO.
Newnham, E. A. and Page, A. C. (2010). Bridging the gap between best evidence and best practice in mental health. Clinical Psychology Review, 30, 127142.
Orkin, F. (2010). Risk stratification, risk adjustment, and other risks. Anesthesiology, 113, 10011003.
Richards, D. A. and Suckling, R. (2009). Improving access to psychological therapies: Phase IV prospective cohort study. British Journal of Clinical Psychology, 48, 377396.
Richards, D. A. and Borglin, G. (2011). Implementation of psychological therapies for anxiety and depression in routine practice: two year prospective cohort study. Journal of Affective Disorders, 133, 5160.
Roth, A. and Fonagy, P. (2004). What Works for Whom? A critical review of psychotherapy research (2nd edn). New York: Guilford Press.
Roth, A. D. and Pilling, S. (2007). The Competences Required to Deliver Effective Cognitive and Behavioural Therapy for People with Depression and with Anxiety Disorders. London: Department of Health. Retrieved February 10, 2012 from
Royal College of Psychiatrists (2011). National Audit of Psychological Therapies for Anxiety and Depression, National Report 2011.
Shimokawa, K., Lambert, M. J. and Smart, D. W. (2010). Enhancing treatment outcome of patients at risk of treatment failure: meta-analytic and mega-analytic review of a psychotherapy quality assurance system. Journal of Consulting and Clinical Psychology, 78, 298311.
Siev, J., Huppert, J. and Chambless, D. L. (2009). The dodo bird, treatment technique, and disseminating empirically supported treatments. The Behavior Therapist, 32, 6975.
Spitzer, R., Kroenke, K., Williams, J. B. W. and Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine, 66, 10921097.
Wahl, I., Meyer, B., Löwe, B. and Rose, M. (2010). Measurement of patient reported outcomes in psychotherapy research. Journal of Psychosomatic Research, 68, 676.
Weersing, V. R. and Weisz, J. R. (2002). Community clinic treatment of depressed youth: benchmarking usual care against CBT clinical trials. Journal of Consulting and Clinical Psychology, 70, 299310.


Related content

Powered by UNSILO

Benchmarking Routine Psychological Services: A Discussion of Challenges and Methods

  • Jaime Delgadillo (a1), Dean McMillan (a2), Chris Leach (a3), Mike Lucock (a3), Simon Gilbody (a2) and Nick Wood (a4)...


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.

Benchmarking Routine Psychological Services: A Discussion of Challenges and Methods

  • Jaime Delgadillo (a1), Dean McMillan (a2), Chris Leach (a3), Mike Lucock (a3), Simon Gilbody (a2) and Nick Wood (a4)...
Submit a response


No Comments have been published for this article.


Reply to: Submit a response

Your details

Conflicting interests

Do you have any conflicting interests? *