Measurement-based care (MBC)Reference Fortney, Unützer, Wren, Pyne, Smith and Schoenbaum 1 , Reference Trivedi, Rush, Wisniewski, Nierenberg, Warden and Ritz 2 has beneficial effects on achieving response and remission of mental health disorders, such as depression.Reference Fortney, Unützer, Wren, Pyne, Smith and Schoenbaum 1 – Reference Davidson, Perry and Bell 6 In addition, MBC can enhance effective communication between patients and clinicians and involvement of patients in clinical decision-making.Reference Fortney, Unützer, Wren, Pyne, Smith and Schoenbaum 1 , Reference Carlier, Meuldijk, van Vliet, van Fenema, van der Wee and Zitman 5 , Reference Valenstein, Adler, Berlant, Dixon, Dulit and Goldman 7 – Reference van der Feltz-Cornelis, Andrea, Kessels, Duivenvoorden, Biemans and Metz 9 Despite these promising prospects of MBC, the progress in the application of outcome measurement in routine mental healthcare is slow,Reference Van Der Wees, Nijhuis-van der Sanden, Ayanian, Black, Westert and Schneider 10 , Reference Delespaul 11 because of the complexity of its implementation.Reference Boswell, Kraus, Miller and Lambert 12 – Reference Duncan and Murray 16
To promote outcome measurement in routine clinical practice in Dutch-specialised mental healthcare, a government-sponsored National Quality Improvement Collaborative (QIC) was initiated.Reference Schouten, Hulscher, van Everdingen, Huijsman and Grol 17 – Reference Metz, Franx, Veerbeek, de Beurs, Van der Feltz-Cornelis and Beekman 20 This National Collaborative gives the unique opportunity to investigate the actual use of outcome measurement in clinical practice and assess the perceived utility of this so-called routine outcome monitoring (ROM).Reference Carlier, Meuldijk, van Vliet, van Fenema, van der Wee and Zitman 5 , Reference van der Feltz-Cornelis, Andrea, Kessels, Duivenvoorden, Biemans and Metz 9 , Reference de Jong, Timman, Hakkaart-van Royen, Vermeulen, Kooiman and Passchier 14 , Reference de Beurs, den Hollander-Gijsman, van Rood, van der Wee, Giltay and van van Noorden 21 The results of this evaluation study, conducted within this National Collaborative, are presented in this paper.
This evaluation study, conducted within the National ROM QIC, aimed at accelerating the implementation of ROM in clinical practice (for details see ‘Intervention’). The study included a parallel group design with matched pairs of participating teams, in which a cluster randomised controlled trial (RCT) was embedded (Fig. 1). In both groups, we investigated the primary outcome: the actual use of ROM in clinical practice and the perceived clinical utility of outcome measurement. In addition, we tested whether there were differences among three groups of clinicians (physicians, psychologists and nurses).
The participating specialised mental healthcare providers were each requested to enrol two similar teams. In total, 21 intervention teams across the country participated in the Collaborative and survey. Fourteen of them had a matched control team from the same provider, treating the same patient group (age, diagnosis and setting) in the same geographical catchment area. Of the 14 matched pairs, 6 pairs were randomly and 8 were non-randomly assigned to either the intervention or the control condition. The randomisation of six matched pairs was conducted by an independent data managerReference Metz, Franx, Veerbeek, de Beurs, Van der Feltz-Cornelis and Beekman 20 (Dutch Trial Register, NTR5262) (Fig. 1). The 14 control teams conducted ROM ‘as usual’ and implemented the best practice, only after the end of the study.
In the teams not participating in the RCT, the participating mental health organisations were allowed to choose which of their two parallel teams was assigned to the experimental arm of the study and which was assigned to the control condition. Both teams were treating similar patient groups in the same geographical catchment area, just as the matched pairs of the randomised teams. In this paper, we present the results of the parallel group design and the nested RCT.
The teams consisted of three groups of clinicians: physicians, psychologists and nurses. The exact multidisciplinary composition depended on the patient group to be treated (i.e. nurses typically work in chronic care and psychologists in short-term curative out-patient treatment). For the study, no patient involvement was required; thus, no informed consent was needed.
Intervention: National ROM QIC
The Collaborative promoted the routine use of clinical outcome questionnaires or rating scales at the beginning, during and at the end of the treatment. Clinicians were asked to discuss the ROM results with their patients to guide treatment decisions jointly. To help implement this ROM practice, the participating teams followed a National QIC programme of 1-year duration. A QIC is a multifaceted implementation strategy.Reference Schouten, Hulscher, van Everdingen, Huijsman and Grol 17 – Reference Øvreveit, Bate, Cleary, Cretin, Gustafson and McInnes 19 It comprised a mix of improvement methods, applied both nationally and locally (in the teams). Conference days, training and booster sessions for exchange and learning, with experts and patient representatives present, were important national components of the improvement strategy. Moreover, the local teams, with involvement of patient representatives and supported by their management, determined their own improvement plans, specified in goals, actions and indicators. The multidisciplinary local teams organised meetings at their own location to work on their improvement plans. The teams planned, implemented, evaluated and adjusted their plans to improve the application of ROM in clinical practice in Plan-Do-Check-Act cycles.Reference Øvreveit, Bate, Cleary, Cretin, Gustafson and McInnes 19 , Reference Berwick 22 , Reference van Splunteren, van Everdingen, Janssen, Minkman, Rouppe van de Voort and Schouten 23 After the Collaborative's ending, the control teams are all offered the intervention.
Measurements: primary outcome
The primary outcome, the actual use and perceived clinical utility of ROM in clinical practice, was assessed by a surveyReference Nuijen, Wijngaarden, Veerbeek, Franx, Meeuwissen and Bon van-Martens 24 for clinicians at two moments: at the beginning (T 0) and at the end (T 1) of the QIC (after 1 year). Data collection took place independent of the Collaborative, by a data management team. Clinicians were invited and received a reminder by email to fill out the survey. The results were processed anonymously, and respondents were only labelled by team.
The survey had previously been developed by the Trimbos Institute, Netherlands Institute of Mental Health and Addiction.Reference Nuijen, Wijngaarden, Veerbeek, Franx, Meeuwissen and Bon van-Martens 24 Commissioned by the Ministry of Health, Welfare and Sport, the survey aims to identify the degree of implementation of ROM. The items of the survey were based on a systematic literature search of studies into influencing factors to ROM implementation and on expert meetings that assessed and rated the relevance of the identified factors. After a pilot test among clinicians, this development process resulted in a survey with 22 statements measuring the use of ROM in clinical practice from the perspective of the clinician. All statements had five response categories, ranging from ‘strongly disagree’ (score 1) to ‘strongly agree’ (score 5). A higher score meant a better implementation and use of ROM in clinical practice.Reference Nuijen, Wijngaarden, Veerbeek, Franx, Meeuwissen and Bon van-Martens 24 Exploratory factor analysis demonstrated a four-factor structure of the instrument:
• Individual use and perceived utility of ROM in daily practice, consisting of eight items, for example ‘I use the ROM scores to evaluate the course of treatment’
Use of ROM in the team and organisational preconditions (seven items), for example ‘ROM scores are used in multidisciplinary consultations’
Usefulness of the ROM questionnaires (four items), for example ‘The questionnaires are suitable for measuring change’
Accessibility of ROM for patient and clinician (three items), for example ‘The output of ROM is simple and attractive’
In addition, a total scale score is calculated by summing all the items. The internal consistency of the total scale and the domain ‘Individual use and perceived utility of ROM in daily practice’ is very good, respectively, α=0.93 and α=0.91. The Cronbach's alphas of the two other domains are good: ‘Use of ROM in the team and organisational preconditions’ α=0.86 and ‘Usefulness of the ROM questionnaires’ α=0.86. The internal consistency of the domain ‘Accessibility ROM for patient and clinician’ is less adequate (α=0.51). However, this scale was maintained in the survey, first because of the importance of the content of these items. According to implementation literatureReference Fortney, Unützer, Wren, Pyne, Smith and Schoenbaum 1 , Reference Carlier, Meuldijk, van Vliet, van Fenema, van der Wee and Zitman 5 , Reference Boswell, Kraus, Miller and Lambert 12 , Reference de Jong, Timman, Hakkaart-van Royen, Vermeulen, Kooiman and Passchier 14 , Reference Duncan and Murray 16 , Reference de Beurs, den Hollander-Gijsman, van Rood, van der Wee, Giltay and van van Noorden 21 and experiences in the intervention teams, the accessibility of ROM results for patients and clinicians is an important precondition in using ROM in clinical practice (i.e. giving feedback on outcome data to patients and clinicians, communicating about the results, validating and using the information for (changes in) treatment plans). Second, a Cronbach's alpha >0.5 is deemed just acceptable, with a minimum of three items contributing to the domain.Reference Vet de, Terwee, Mokkink and Knol 25
Analysis was performed on the four subdomains of the survey and the total scale score. Data were analysed by SPSS for Windows, version 22. First, the number of teams, the number of drop-outs of the study, response to the survey and the composition of teams who responded to the survey, were described. Chi-squared tests were used to test potential differences in the composition of teams between the intervention and control groups. To calculate differences between T 0 and T 1 and the difference at T 1 between the intervention and control groups, independent sample t-tests were used, because clinicians of the participating teams, who filled out the survey at T 0 and T 1, were not always the same. Mean, standard deviation, confidence intervals and effect sizes were computed. The effect sizes were calculated by the following formula: Mpost−Mpre/SDpooled (because of independent groups).
SDpooled=√ ((SD12+SD22)/2) using the effect size calculator for separate groups of L. Becker, University of Colorado (Reference Lipsey and Wilson 26 We repeated the analysis described above for the randomised teams (the nested RCT). Finally, in the intervention group of the parallel group design we looked at the differences between three main groups of clinicians (physicians, psychologists and nurses). Independent sample t-tests were used to calculate differences between T 0 and T 1 for each group of clinicians separately. Differences between the groups of clinicians on T 0 and T 1 were tested with analysis of variance (ANOVA) and post hoc tests (Bonferroni).). The thresholds for interpreting the effect size were small (0.00–0.32), medium (0.33–0.55) and large (0.56–1.20).
This study was designed to detect, in the intervention teams of the parallel group design, a medium effect size of d=0.5 on the primary outcome ‘actual use and the perceived clinical utility of ROM in clinical practice’ comparing T 1 with T 0. With α=0.05 and a power β=0.80, the required sample size was 65 clinicians in the intervention group.Reference Lipsey 27
In each paragraph, the results are first described for the total parallel group design and next for the nested randomised design. Putative differences in effects among types of clinicians are shown for the parallel group design.
Parallel group design
Twenty-one teams from organisations of specialised mental healthcare across the country participated (see Fig. 2, flowchart 2a). In 14 of them, two similar teams were included. Flowchart 2a shows that, during the Collaborative, three teams dropped out between T 0 and T 1, mainly because of reorganisations and personnel changes in the participating teams.
At T 0, 69% of the clinicians in the intervention group and 75% in the control group responded to the survey. The types of clinicians responding to the survey in terms of profession were 11% physicians, 53% psychologists and 36% nurses in the intervention group, and 21% physicians, 43% psychologists and 36% nurses in the control group. The composition between intervention and control teams did not differ significantly.
At T 1, 89% of the clinicians in the intervention group and 62% in the control group responded to the survey. The composition of the clinicians responding to the survey was 25% physicians, 44% psychologists and 31% nurses in the intervention group, and 17% physicians, 40% psychologists and 43% nurses in the control group. As with T 0, the differences in composition at T 1 between intervention and control groups were not significant.
Cluster randomised control design
In Fig. 2, flowchart 2b shows loss of data over time in the randomised teams. In total, clinicians of six intervention teams and six control teams filled out the survey. Between T 0 and T 1, one team dropped out because of reorganisation and personnel changes. At T 0, 73% of the clinicians in the intervention group and 83% in the control group responded to the survey. The composition of the group of clinicians responding to the survey in terms of profession was 0% physicians, 65% psychologists and 35% nurses in the intervention teams, and 13% physicians, 67% psychologists and 20% nurses in the control teams.
At T 1, 73% of the clinicians in the intervention group and 75% in the control group responded to the survey. At T 1, the composition of these groups of clinicians responding to the survey was 9% physicians, 58% psychologists and 33% nurses in the intervention group, and 0% physicians, 54% psychologists and 46% nurses in the control group. Both at T 0 and T 1, there were no significant differences in the composition of clinicians between intervention and control groups.
Results of the survey
To demonstrate the changes in the actual use and perceived clinical utility of ROM in the teams which participated in the Collaborative, first the difference between first (T 0) and final measurements (T 1) of the intervention group is described. Second, we looked at the differences between intervention and control groups at the end of the Collaborative (T 1). The results are demonstrated for both the parallel groups as the nested randomised design.
Differences between first and final measurements of the intervention group
Parallel group design
In the intervention group, significant positive differences were shown between T 0 and T 1 on the total scale and all subscales of the survey ‘ROM in daily practice’ with medium to large effect sizes (between 0.55 and 1.02, with an effect size of 0.99 on the total scale) (Table 1). The control group showed no significant differences between T 0 and T 1.
|Survey domains||Parallel group design intervention teams||Cluster randomised control trial intervention teams|
|N||Mean||s.d.||Effect size||Sig. t-tailed||95% CI of the difference||N||Mean||s.d.||Effect size||Sig. t-tailed||95% CI of the difference|
|Individual use and perceived utility of ROM in daily practice||T 0||91||3.28||1.01||0.62||0.000||−0.84||−0.29||19||3.22||1.07||1.11||0.002||−1.47||−0.36|
|Use of ROM in the team and organisational preconditions||T 0||91||2.59||0.85||1.02||0.000||−1.05||−0.57||19||2.59||0.80||1.14||0.001||−1.36||−0.37|
|Usefulness of the ROM questionnaires||T 0||91||2.95||0.85||0.55||0.000||−0.77||−0.22||19||3.07||0.96||0.97||0.005||−1.35||−0.26|
|Accessibility ROM for patient and clinician||T 0||91||2.95||0.72||0.88||0.000||−0.86||−0.42||19||2.96||0.97||1.07||0.002||−1.39||−0.33|
|Total score of the ROM in daily practice||T 0||91||2.94||0.64||0.99||0.000||−0.82||−0.43||19||2.96||0.83||1.25||0.000||−1.31||−0.41|
CI, confidence interval; RCT, randomised controlled trial; ROM, routine outcome monitoring; Sig., significance.
Cluster randomised control design
The randomised group showed comparable results in the application of ROM in daily practice (Table 1). The effect sizes in the randomised intervention group were even larger (between 0.97 and 1.25, with an effect size of 1.25 on the total scale) than in the intervention group of the parallel group design (Table 1). Also in this design, the control group showed no significant differences between first and final measurements.
Differences in final measurements between intervention and control group
Parallel group design
When the differences in T 1 between the intervention and control groups were tested, the final measurement of the intervention group scored significantly higher than the control group (Table 2). This means that at the end of the improvement year, ROM in daily practice is better implemented and used in clinical practice by respondents in the intervention group compared with respondents in the control group.
|Survey domains||Parallel group design T 1 intervention and control||Cluster randomised control trial T 1 intervention and control|
|N||Mean T2||s.d.||Sig. t-tailed||95% CI of the difference||N||Mean T2||s.d.||Sig. t-tailed||95% CI of the difference|
|Individual use and perceived utility of ROM in daily practice||I||79||3.84||0.80||0.000||0.52||1.20||19||4.14||0.47||0.000||0.74||1.72|
|Use of ROM in the team and organisational preconditions||I||79||3.40||0.74||0.000||0.44||1.08||19||3.45||0.71||0.005||0.24||1.25|
|Usefulness of the ROM questionnaires||I||79||3.45||0.95||0.008||0.15||0.95||19||3.87||0.68||0.011||0.19||1.32|
|Accessibility ROM for patient and clinician||I||79||3.59||0.74||0.000||0.28||0.94||19||3.82||0.59||0.001||0.45||1.55|
|Total score of the ROM in daily practice||I||79||3.57||0.63||0.000||0.42||0.97||19||3.82||0.51||0.000||0.50||1.36|
C, control group; CI, confidence interval; I, intervention; RCT, randomised controlled trial; ROM, routine outcome monitoring; Sig., significance.
Cluster randomised control design
While comparing the final measurements (T 1), the above-mentioned positive significant results in favour of the intervention teams were also shown in the RCT (Table 2).
Differences between clinicians
When comparing the first and final measurements in the intervention group of the parallel group design (Table 3), nurses and psychologists in the intervention group demonstrated at T 1 a significantly higher score on all the survey domains with large effect sizes (nurses between 0.68 and 1.28; psychologists between 0.57 and 1.17). Physicians in the intervention group scored at T 1, compared with T 0, significantly higher on the total score and the subdomain ‘Use of ROM in the team and organisational preconditions’, with large effect sizes on these scales (1.51 and 0.97). The three groups of clinicians participating in the control group showed no significant increase of T 1 relative to T 0.
|N||Mean||s.d.||Effect size||Sig. t-tailed||95% CI of the difference||N||Mean||s.d.||Effect size||Sig. t-tailed||95% CI of the difference||N||Mean||s.d.||Effect size||Sig. t-tailed||95% CI of the difference|
|Individual use and perceived utility of ROM in daily practice||T 0||26||3.00||1.23||0.98||0.001||−1.52||−0.40||39||3.50||0.85||0.57||0.023||−0.85||−0.07||8||2.89||1.13||0.200||−1.47||0.33|
|Use of ROM in the team and organisational preconditions||T 0||26||2.49||1.05||1.11||0.001||−1.61||−0.48||39||2.56||0.69||1.17||0.000||−1.15||−0.48||8||2.13||0.85||1.51||0.001||−1.84||−0.51|
|Usefulness of the ROM questionnaires||T 0||26||2.90||0.79||0.68||0.024||−1.11||−0.08||39||3.00||0.87||0.59||0.018||−0.94||−0.09||8||2.63||1.11||0.148||−1.70||0.27|
|Accessibility ROM for patient and clinician||T 0||26||2.58||0.82||1.28||0.000||−1.44||−0.52||39||3.15||0.64||0.87||0.001||−0.87||−0.25||8||2.75||0.64||0.155||−1.21||0.20|
|Total score of the ROM in daily practice.||T 0||26||2.74||0.76||1.28||0.000||−1.31||−0.48||39||3.05||0.54||1.13||0.000||−0.84||−0.33||8||2.60||0.74||0.97||0.037||−1.43||−0.05|
CI, confidence interval; ROM, routine outcome monitoring; Sig., significance.
At T 0, compared with the psychologists of the intervention group, nurses of this group showed a significantly lower score on the domain ‘Accessibility ROM for patient and clinician’ (P=0.006 and CI=−1.020 to −0.134). During the Collaborative year, the differences between these groups of clinicians were reduced. At T 1, no significant differences were shown between the groups of clinicians in the intervention group.
This paper presents the findings from the government-sponsored National QIC aimed to accelerate the implementation of ROM in Dutch-specialised mental healthcare. The study included a parallel group design with matched pairs of participating teams, in which a cluster RCT was nested. In both intervention and control teams, the actual use of ROM in routine clinical practice and the perceived clinical utility of outcome measurements were investigated at the beginning and end of the Collaborative.
In both the parallel group design and the nested RCT, the intervention teams reported much better results with respect to the actual use and the perceived clinical utility of ROM (Tables 1 and 2). In the parallel group design, which included 21 intervention teams across the country, the overall effect was large (d=0.99). Notably, the effect size in the nested RCT was even bigger (d=1.25) than in the study with the parallel groups. This is probably because of the more rigorous research design and implementation protocol that was used in the RCT. Considering putative differences among specific groups of clinicians, psychologists and nurses participating in the intervention group demonstrated a large improvement on both the overall scale and all the subdomains, measuring different aspects of ROM implementation. The physicians taking part in the study showed a similar large improvement in the overall scale. Looking at the specific subscales, their improvement was restricted to the domain ‘Use of ROM in the team and organisational preconditions’. This may be explained by the tasks physicians have in the teams, which are less focused on the execution of the ROM measures and more on the team supervision and organisation of care. Their assessments of the usefulness of ROM may have been more driven by the ROM-related activities they noticed in the team, represented by the subscale ‘Use of ROM in the team and organisational preconditions’. The other three subdomains showed practical and executive functions in the application of ROM. The baseline difference among psychologists and nurses on the subdomain ‘Accessibility ROM for patient and clinician’ might be related to the background of psychologists who are generally more inclined to use measurement instruments in daily practice. It is encouraging to see that this targeted intervention succeeded in reducing the difference between psychologists and nurses, implying that the intervention was successful in engaging nursing personnel in this area that is so important for their work.
In this study, we had the unique opportunity to nest a rigorous experimental study (RCT) design within a government-sponsored national initiative to improve mental healthcare. We built on previous work in which the survey was developed.Reference Nuijen, Wijngaarden, Veerbeek, Franx, Meeuwissen and Bon van-Martens 24 The teams experienced ownership of their improvement process and were facilitated by the National QIC. A variety of teams with a multidisciplinary composition of clinicians treating different patient groups (age, diagnoses and setting) participated in the study. Independent data collection took place by a data management team, which processed the results anonymously. Thus, the likelihood of socially desirable answers and influence of the research team on the results were diminished. To prevent possible influence of confounding, the results were shown for both the parallel group design and the nested cluster randomised design separately. Strength of the parallel group design was the large external validity because of the number and variation of the participating teams. The randomised group included fewer teams, but the risk of confounding was reduced, and in this design, we conducted a strict research and implementation protocol.
The study also had some limitations which may have influenced the results. First, the clinicians were aware of the objective of the National Collaborative, which may have affected their answers on the survey. Second, there may have been cross-over effects of knowledge and experiences from the intervention to control group. Third, the survey could be seen as a process evaluation, focusing on the implementation of ROM seen by clinicians who participated in the Collaborative. To get insight in the experiences of patients and the effectiveness of the intervention at patient level, an additional study is underway, which will research the effects on decisional conflict of patients, working alliance, treatment adherence, clinical outcome and quality of live.Reference Metz, Franx, Veerbeek, de Beurs, Van der Feltz-Cornelis and Beekman 20 Finally, the follow-up is restricted, and it is unknown how the teams fared with ROM over a longer time. Given the large effect sizes between the final and first measurements and the attention that was given during the Collaborative to the continuity of the implementation afterwards, we expect the intervention teams will maintain the positive effects of the Collaborative. Nevertheless, it is still important to ensure that the teams continue the intervention by organising follow-up and booster sessions.
Given the above limitations, our overall conclusion is that the implementation of outcome measurement in clinical practice was highly successful and appreciated by the multidisciplinary teams that were involved. All the three groups of clinicians participating in the intervention group take advantage of the ROM implementation and showed, at the end of the Collaborative, an equal level in the actual use and the perceived utility of ROM in clinical practice. Successful in the ROM implementation is the bottom-up approach, in which multidisciplinary teams were facilitated to complete their own improvement cycle. This study is unique in that we combined a National Collaborative of Quality Improvement in mental healthcare with an evaluation study in two designs, a parallel group design and a nested RCT. The results have both internal (with regard to the rigorous design and implementation) and external (given the nationwide implementation and evaluation) validity. Given the established advantages of MBC and the difficulties previously encountered in implementing the use of ROM in routine care, these results are encouraging and call for more implementation efforts along these lines.
The authors would like to thank Mr P. van Splunteren MSc and Mrs H. Sinnema PhD for the project management of the National Collaborative. They also thank the data management team of Trimbos Institute for their support in the data collection. Finally, they thank the clinicians of the participating organisations for responding to the survey.
The project is funded by the National Network for Quality Development in mental healthcare and conducted by the Trimbos Institute, Netherlands Institute of Mental Health and Addiction.