Hostname: page-component-8448b6f56d-t5pn6 Total loading time: 0 Render date: 2024-04-17T06:58:39.620Z Has data issue: false hasContentIssue false

Equivalence and non-inferiority testing in psychotherapy research

Published online by Cambridge University Press:  11 May 2018

Falk Leichsenring*
Department of Psychosomatics and Psychotherapy, Justus-Liebig-University Giessen, Ludwigstr 76, D-35392 Giessen, Germany
Allan Abbass
Department of Psychiatry, Dalhousie University, Centre for Emotions and Health, Halifax 8203-5909 Veterans Memorial Lane, Halifax, NS B3H 2E2, Canada
Ellen Driessen
Department of Clinical, Neuro and Developmental Psychology, Amsterdam Public Health research institute, Vrije Universiteit Amsterdam, Van der Boechorststraat 1, 1081 BT Amsterdam, The Netherlands
Mark Hilsenroth
Derner School of Psychology, Adelphi University, Hy Weinberg Center, 1 South Avenue, Garden City, NY 11530-0701, USA
Patrick Luyten
Faculty of Psychology and Educational Sciences, University of Leuven, Klinische Psychologie (OE), Tiensestraat 102 – bus 3722, 3000 Leuven, Belgium Research Department of Clinical, Educational and Health Psychology, University College London, Gower Street, London WC1E 6BT, UK
Sven Rabung
Department of Psychology, Alpen-Adria-Universität Klagenfurt, Universitätsstr, 65-67, A-9020 Klagenfurt, Austria
Christiane Steinert
Department of Psychosomatics and Psychotherapy, Justus-Liebig-University Giessen, Ludwigstr 76, D-35392 Giessen, Germany Department of Psychology, MSB Medical School Berlin, Calandrellistr. 1-9, 12247 Berlin, Germany
Author for correspondence: Falk Leichsenring, E-mail:
Rights & Permissions [Opens in a new window]


Copyright © Cambridge University Press 2018 

With more than 100 non-inferiority or equivalence trials published per year in many areas of research (Piaggio et al., Reference Piaggio2012), statistical and methodological issues involved in these trials become increasingly important. A recent article by Rief and Hofmann (Reference Rief and Hofmann2018) suggests, however, that some of these issues are not sufficiently clear. For this reason, central issues will be discussed here and some misunderstandings will be addressed.

Equivalence and non-inferiority margins

For defining a non-inferiority or equivalence margin (i.e. the minimum difference important enough to make treatments non-equivalent), no generally accepted standards exist. In 332 equivalence or non-inferiority medical trials, a median margin of 0.50 standard deviations was found (Lange and Freitag, Reference Lange and Freitag2005), corresponding quite well to the value of 0.42 reported by Gladstone and Vach (Reference Gladstone and Vach2014). Only five studies used margins < 0.25 (Gladstone and Vach, Reference Gladstone and Vach2014) and only 12% of studies margins ⩽0.25 (Lange and Freitag, Reference Lange and Freitag2005).

In psychotherapy research, margins ranging from 0.24 to 0.60 have been proposed (e.g. Steinert et al., Reference Steinert2017, p. 944). In a meta-analysis of psychodynamic therapy (PDT) including different mental disorders, Steinert et al. (Reference Steinert2017) chose a margin of g = 0.25, which is among the smallest margins ever used in psychotherapy and medical research (Gladstone and Vach, Reference Gladstone and Vach2014, Figure 2, Steinert et al., Reference Steinert2017, p. 944). This margin is very close to both (a) the threshold for a minimally important difference specifically suggested for depression (0.24, Cuijpers et al., Reference Cuijpers2014), and (b) the margin recommended by Gladstone and Vach (Reference Gladstone and Vach2014) to protect against degradation of treatment effects in non-inferiority trials (d = −0.23).

In their recent correspondence article, Rief and Hofmann (Reference Rief and Hofmann2018) make a quite different proposal, recommending margins not to fall below 90% of the uncontrolled effect size of the established treatment. This proposal, however, is associated with several problems described in more detail in Table 1, particularly regarding the clinical significance of the suggested margin and its implications for sample size determination, rendering non-inferiority trials in psychotherapy research virtually impossible (Table 1).

Table 1. Further methodological issues of equivalence and non-inferiority testing

a Paul Crits-Christoph, personal communication, 16 February 2018.

b Paul Crits-Christoph, personal communication, 26 February 2018.

Statistical hypotheses in equivalence and non-inferiority testing

In equivalence testing, the null and alternative hypotheses of superiority testing are reversed and the statistical alternative hypothesis is consistent with the assumption of equivalence (Lesaffre, Reference Lesaffre2008; Walker and Nowacki, Reference Walker and Nowacki2011). To test for equivalence, two one-sided tests are performed determining whether the upper and the lower boundary of the CI are included in the margin, whereas, for testing non-inferiority, one one-sided test inspecting the lower boundary is used (Lesaffre, Reference Lesaffre2008; Walker and Nowacki, Reference Walker and Nowacki2011). A statistically significant result implies here that the effect size and its CI are within the margin, demonstrating equivalence or non-inferiority (Walker and Nowacki, Reference Walker and Nowacki2011). A recent meta-analysis testing equivalence of PDT to other approaches established in efficacy reported a significant result indicating that the effect sizes and their CIs were completely included in the margin (Steinert et al., Reference Steinert2017). Thus, the recently given interpretation by Rief and Hofmann (Reference Rief and Hofmann2018, p. 2) that Steinert et al. (Reference Steinert2017) ‘… found a significant disadvantage of PDT [psychodynamic therapy] compared with other treatments (including CBT)’ is simply wrong (Lesaffre, Reference Lesaffre2008; Walker and Nowacki, Reference Walker and Nowacki2011).

Equivalence v. non-inferiority testing

Equivalence and non-inferiority testing need to be differentiated (Treadwell et al., Reference Treadwell2012). In non-inferiority testing, for example, the test treatment is expected to be superior to the standard treatment in measures not related to efficacy such as side effects or costs (Treadwell et al., Reference Treadwell2012). Rief and Hofmann did not make this differentiation. In fact, the meta-analysis by Steinert et al. (Reference Steinert2017), for example, was a test of equivalence, not of non-inferiority as suggested by Rief and Hofmann (Reference Rief and Hofmann2018).

Assay sensitivity and constancy of study conditions

Equivalence and non-inferiority testing require that the efficacy of the comparator is ensured and that the study conditions are comparable with in which the efficacy of the comparator was established (Treadwell et al., Reference Treadwell2012). In those context, Rief and Hofmann (Reference Rief and Hofmann2018) claim that specific issues of (low) study quality favour non-inferiority results, e.g. low response rates found in specific studies or low treatment integrity. Again, however, these claims are not supported by evidence (Table 1). This applies to several further issues put forward by Rief and Hofmann (Reference Rief and Hofmann2018) which are briefly discussed in Table 1, for example to the relationship between equivalence testing and the number of studies available for a specific treatment (Table 1).


Equivalence and non-inferiority testing pose specific methodological problems (Piaggio et al., Reference Piaggio2012; Treadwell et al., Reference Treadwell2012), for example, in defining a margin, statistical testing, and ensuring the efficacy of the comparator or comparability of study conditions (Table 1). Conclusions about equivalence and non-inferiority testing differing from Rief and Hofmann's (Reference Rief and Hofmann2018) are presented which are more consistent with the available evidence and usual standards across a range of scientific disciplines.


Connolly Gibbons, MB et al. (2016) Comparative effectiveness of cognitive therapy and dynamic psychotherapy for major depressive disorder in a community mental health setting: a randomized clinical noninferiority trial. JAMA Psychiatry 9, 904911.Google Scholar
Cuijpers, P et al. (2014) What is the threshold for a clinically relevant effect? The case of major depressive disorders. Depression and Anxiety 31, 374378.Google Scholar
Cuijpers, P et al. (2016) How effective are cognitive behavior therapies for major depression and anxiety disorders? A meta-analytic update of the evidence. World Psychiatry 15, 245258.10.1002/wps.20346Google Scholar
Driessen, E et al. (2013) The efficacy of cognitive-behavioural therapy and psychodynamic therapy in the outpatient treatment of major depression: a randomized clinical trial. American Journal of Psychiatry 170, 10411050.10.1176/appi.ajp.2013.12070899Google Scholar
Gladstone, BP and Vach, W (2014) Choice of non-inferiority (NI) margins does not protect against degradation of treatment effects on an average--an observational study of registered and published NI trials. PLoS ONE 9, e103616.Google Scholar
Lange, S and Freitag, G (2005) Therapeutic equivalence – clinical issues and statistical methodology in noninferiority trials choice of delta: requirements and reality – results of a systematic review. Biometrical Journal 47, 1227.Google Scholar
Lesaffre, E (2008) Superiority, equivalence, and non-inveriority trials. Bulletin of the NYU Hospital for Joint Diseases 66, 150154.Google Scholar
McGlothlin, AE and Lewis, RJ (2014) Minimal clinically important difference: defining what really matters to patients. JAMA 312, 13421343.Google Scholar
Munder, T et al. (2013) Researcher allegiance in psychotherapy outcome research: an overview of reviews. Clinical Psychology Review 33, 501511.Google Scholar
Persons, JB, Bostrom, A and Bertagnolli, A (1999) Results of randomized controlled trials of cognitive therapy for depression generalize to private practice. Cognitive Therapy and Research 23, 535548.Google Scholar
Piaggio, G et al. (2012) Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA 308, 25942604.Google Scholar
Rief, W and Hofmann, SG (2018) Some problems with non-inferiority tests in psychotherapy research: psychodynamic therapies as an example. Psychological Medicine, 13.Google Scholar
Steinert, C et al. (2017) Psychodynamic therapy: as efficacious as other empirically supported treatments? A meta-analysis testing equivalence of outcomes. American Journal of Psychiatry 174, 943953.Google Scholar
Thoma, NC et al. (2012) A quality-based review of randomized controlled trials of cognitive-behavioral therapy for depression: an assessment and metaregression. American Journal of Psychiatry 169, 2230.Google Scholar
Treadwell, JR et al. (2012) Assessing equivalence and noninferiority. Journal of Clinical Epidemiology 65, 11441149.Google Scholar
Walker, E and Nowacki, AS (2011) Understanding equivalence and noninferiority testing. Journal of General Internal Medicine 26, 192196.Google Scholar
Webb, CA, deRubeis, RJ and Barber, J (2010) Therapist adherence/competence and treatment outcome: a meta-analytic review. Journal of Consulting and Clinical Psychology 78, 200211.Google Scholar
Figure 0

Table 1. Further methodological issues of equivalence and non-inferiority testing