Increasingly, new technologies that supplement clinical decision-making are being implemented to respond to the need to improve mental health treatment outcomes.Reference Rosenfeld, Benrimoh, Armstrong, Mirchi, Langlois-Therrien and Rollins1 Some of these tools are designed to be used at the point of care, during sessions with patients, and may be expected to have some impact on physician–patient interactions, which may, in turn, affect the physician–patient relationship, one of the most critical aspects of psychiatric intervention.Reference Nolan and Badger2 It is challenging to test the effect of tools on these interactions and on clinical workflow, as directly observing clinical interviews can be impractical or raise concerns about the validity of observations.
We assess the use of simulation to directly observe the impact of an artificial intelligence-powered decision support tool on simulated patient–clinician interactions. The objective was to determine if and how the use of the tool during a session affected on the physician–patient interaction, as a prelude to longitudinal clinical studies assessing longer-term effects on clinical workflow and the physician–patient relationship. Using simulation, clinician behaviour can be observed in a secure setting,Reference Lamé and Dixon-Woods3 and data can be collected from multiple viewpoints, i.e. that of the clinician, the standardised patient and the observer. This triangulation process is a rigorous method for gathering high-quality data.Reference Patton4 We discuss the challenges encountered and insights gained from our experience with simulation-based testing of new technology.
Background on depression treatment challenges
As noted, in this paper we focus on the simulation centre testing of a tool aimed at supporting clinical decision-making during treatment selection for depression. This is an important field of work because depression is a common condition, with over one in nine people experiencing it in their lifetime,Reference Bromet, Andrade, Hwang, Sampson, Alonso and De Girolamo5 with a high burden, now being the leading cause of disability globally.Reference Ferrari, Charlson, Norman, Patten, Freedman and Murray6,7 Although many people with depression remain undiagnosed,Reference Williams, Chung and Muennig8 among those who are and who receive treatment only roughly a third will achieve remission during a first treatment course,Reference Warden, Rush, Trivedi, Fava and Wisniewski9 with many patients needing to go through multiple treatment trials before finding an effective treatment. Physicians (both psychiatrists and primary care physicians) are faced with a large selection of effective treatments, as well as guidelines which help to manage treatments once they are chosen, but they do not currently have access to tools that can help them effectively choose between the existing first-line agents to optimise chances of treatment success and minimise the need for repeated trial-and-error treatment trials.Reference Benrimoh, Fratila, Israel, Perlman, Mirchi and Desai10 This need for improved decision support has led to a number of projects aimed at improving the personalisation of treatment selection, notably pharmacogenomics.Reference Greden, Parikh, Rothschild, Thase, Dunlop and DeBattista11 However, pharmacogenomics may be expensive, and samples may take time to be processed, which could be used to treat the patient. One solution would be a tool that can assist with the personalisation of treatment at the point of care, using readily available clinical and demographic data; for this purpose, a number of researchersReference Chekroud, Zotti, Shehzad, Gueorguieva, Johnson and Trivedi12,Reference Webb, Trivedi, Cohen, Dillon, Fournier and Goer13 have explored the use of machine learning and artificial intelligence as collections of techniques that can assess complex patterns (such as are found in patient data) and link them to outcomes (such as remission). In this paper, the tool discussed utilises artificial intelligence to provide clinicians with estimates of the likely efficacy of different treatments, to assist them in shared decision-making about which agent to try first with their patient. Future iterations of the tool will extend this to decision-making after treatment failure. Regardless of the specific point in the care pathway of a given patient, learning about the acceptability and useability of these kinds of clinical decision support systems (CDSSs) will be key to maximising their clinical impact, and this was a key purpose of the present study. This study was meant to observe how clinicians interact with the tool, as a step in its development and the development of training protocols for clinical studies involving the tool.
Aifred: clinical decision support software for depression treatment
We investigated the use of Aifred, a CDSS that includes an operationalised version of the 2016 Canadian Network for Mood and Anxiety Treatments (CANMAT) guidelines for depression treatment,Reference Kennedy, Lam, McIntyre, Tourjman, Bhat and Blier14 and provides artificial intelligence decision support when treatments are chosen. This artificial intelligence helps support clinicians by considering complex interactions between multiple patient variables to help personalise treatment in order to improve upon a trial-and-error treatment approach and reduce the number of failed treatment trials.Reference Benrimoh, Fratila, Israel, Perlman, Mirchi and Desai10,Reference Mehltretter, Rollins, Benrimoh, Fratila, Perlman and Israel15 It also tracks symptoms by using standardised questionnaires such as the Patient Health Questionnaire-9.Reference Kroenke, Spitzer and Williams16 Major depressive disorder (MDD) was chosen, given its high prevalence,Reference Kessler and Magee17,Reference Lam, Mcintosh, Wang, Enns, Kolivakis and Michalak18 status as the leading cause of disability globally19 and poor remission rates following initial treatment.Reference Warden, Rush, Trivedi, Fava and Wisniewski9
The key innovation is the inclusion of an artificial intelligence tool that provides clinicians with remission probabilities for different treatment options, based on a patient's clinical and demographic profile. This artificial intelligence is layered on top of the operationalised CANMAT guidelines, providing remission probabilities for individual treatments at the point in the guideline when the first-line treatment is chosen. The expected clinical utility of this artificial intelligence model is as follows. As noted, clinicians currently mostly follow a trial-and-error pattern when selecting treatments for depression, and, beyond providing a pool of first-line treatments, the guidelines are not able to precisely guide the selection of individual agents at the beginning of treatment. Although at the population level these treatments are considered to be essentially equally effective,Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson and Ogawa20 the low rates of remission after initial treatment,Reference Warden, Rush, Trivedi, Fava and Wisniewski9 the varying pharmacological profiles and different efficacy of even similar antidepressants,Reference Kennedy, Lam, McIntyre, Tourjman, Bhat and Blier14 and the clinical observation that different patients seem to respond to different treatments, have resulted in efforts to try and identify patterns with machine-learning tools, so as to predict the efficacy of specific agents for individual patients based on clinical and demographic information,Reference Chekroud, Zotti, Shehzad, Gueorguieva, Johnson and Trivedi12 or combining this information with biomarkers.Reference Iniesta, Hodgson, Stahl, Malki, Maier and Rietschel21 The Aifred tool provides remission probabilities for a number of treatments simultaneously, providing the clinician with extra information to help select a treatment within the pool of those recommended by the guidelines. This is meant to provide an estimate of likely treatment benefit, which can be used, alongside consideration of side-effects, medical history and patient preferences, with the intention of optimising treatment choice and reducing the chance a patient will start a treatment that is less likely to help them reach remission. Without these probabilities, there is very little information available to help clinicians select between first-line treatments with respect to their likely efficacy. This artificial intelligence tool is a deep-learning model, trained and validated on baseline clinical and demographic data from 4735 patients from five major studiesReference Mehltretter, Rollins, Benrimoh, Fratila, Perlman and Israel15 (STAR*DReference Warden, Rush, Trivedi, Fava and Wisniewski9, CO-MEDReference Rush, Trivedi, Stewart, Nierenberg, Fava and Kurian22, EMBARCReference Trivedi, McGrath, Fava, Parsey, Kurian and Phillips23, REVAMPReference Trivedi, Kocsi, Thase, Morris, Wisniewski and Leon24 and IRL-GREYReference Lenze, Mulsant, Blumberger, Karp, Newcomer and Anderson25). Patient clinical and demographic features, such as fatigue, physical symptoms and employment status, were identified with a feature selection pipeline described in Mehltretter et al,Reference Mehltretter, Rollins, Benrimoh, Fratila, Perlman and Israel15 and were then used to train a deep neural network. This network's objective was to predict patient remission status, and the drug assigned to the patient in the study was retained as a predictive feature. Once the model was trained, probabilities for remission for each treatment for a new patient could be derived by feeding that patient's clinical and demographic data into the model and then iterating over each of the possible treatments via the treatment-assigned variable. The model currently provides individualised remission probabilities for five commonly used first-line treatments (escitalopram, citalopram, bupropion, venlafaxine and sertraline) and two combination treatments (bupropion plus escitalopram, and venlafaxine plus mirtazapine). The remission probabilities are presented as follows: for each treatment for which a probability can be calculated, a raw remission probability (e.g. 45%) is presented next to the name of the treatment. This probability represents the chance that the individual patient in question will reach remission, assuming appropriate use of the treatment as per guidelines, and an appropriate treatment trial. By clicking on a button labelled ‘more’, included next to each treatment, clinicians were able to see the baseline population remission rate based on the data-set used to train the model (in our case, this was 34.85%), as well as the ‘interpretability report’. This report was a list of up to five of the patient variables that were most important in producing the probability for that drug for the given patient; these were derived using a feature importance algorithm described in Mehltretter et al,Reference Mehltretter, Rollins, Benrimoh, Fratila, Perlman and Israel15 which would produce different sets of features for each treatment, for each patient. In silico testing of this model demonstrated that it is potentially capable of improving population remission rates (testing methods described in Mehltretter et alReference Mehltretter, Rollins, Benrimoh, Fratila, Perlman and Israel15). Future versions of this model are planned to increase the number of predicted treatments, and also to include psychotherapies and augmentation treatments. Note, however, that the focus of this paper is not on the specific artificial intelligence model (which may continue to evolve until the start of clinical trials), but of the impact of such a model, packaged within a digital health platform, on the patient–clinician interaction.
It should also be noted that the integrated CANMAT guidelines provide the most support in terms of the longitudinal management of depression treatment (i.e. when to switch or augment treatments in the case of poor response), and as such in this study, which focused on a singular interaction, functioned mostly to provide an evidence-based pool of initial treatment options that could be differentiated by the artificial intelligence model on a patient-by-patient basis, as well as guideline-derived treatment initiation advice (for example, by reminding clinicians of the benefit of combining pharmacotherapy and psychotherapy). In future studies, the combined effect of longitudinal management using the guidelines and the optimisation of initial treatment selection using the artificial intelligence will be studied, but is out of scope for the present paper.
The tool is intended to be used during patient interviews, providing access to evidence-based decision support. It was designed with a simple interface intended to minimise time spent clicking through menus so that clinicians could focus on reviewing data and the artificial intelligence results, ideally while discussing and viewing them with their patient as part of shared decision-making. Numerical remission probabilities are provided for those treatments on which the model is trained, but clinicians can choose from any of the treatments appearing in CANMAT. This simulation study sought to assess whether the tool, which should always be employed in the context of best clinical judgement and patient preference, could be feasibly used at the point of care as well as maintaining, or possibly enriching, the integrity of the physician–patient interaction.
When designing the Aifred tool, one of the primary considerations was how the tool could support shared decision-making between clinicians and patients, in accordance with best practices.Reference Hopwood26 Indeed, the tool was developed using an informal participatory process where patient input was sought on design during development, and several members of the core development team had lived experience with depression and other mental health conditions, and had experienced treatment selection interactions with clinicians. The tool also at a number of points makes reference to the importance of discussing treatment preferences with patients, as per best practices.Reference Hopwood26 However, despite the fact that shared decision-making is an integral part of good clinical practice, the fact remains that not all clinicians engage in shared decision-making at all times,Reference Hopwood26 and the format of this may change in a clinician-dependent manner. In the context of the deployment of a new tool, we decided to observe how clinicians interact with this tool and use, or not use, it as part of shared decision-making without being explicitly prompted on how to do so. This is why the computer was chosen to be a laptop (which can be easily moved) and why it was positioned at 45 degrees (i.e. with the screen part-way between the patient and the clinician, to allow it to be moved one way or the other and remain in a comfortable position for the clinician to begin using). This provided a useful setup to observe clinician behaviour (i.e. to see if they would turn the screen toward the patient or turn it toward themselves, potentially even before they have had a chance to read prompts on the screen), and then to get feedback from standardised patients about how different clinician approaches to using the tool affected their experience.
Previous decision support research
Although previous studies have suggested that treatment utilising a clinical decision algorithm and measurement-based care lead to better patient outcomes,Reference Trivedi, Rush, Crismon, Kashner, Toprac and Carmody27,Reference Adli, Wiethoff, Baghai, Fisher, Seemüller and Laakmann28 often these studies included support from a clinical team or other non-computerised support.Reference Trivedi, Rush, Crismon, Kashner, Toprac and Carmody27,Reference Dobscha, Corson, Hickam, Perrin, Kraemer and Gerrity29 As such, it is worth reviewing previous work aimed at using computer-based CDSSs to improve depression treatment. Rollman et alReference Rollman, Hanusa, Lowe, Gilbert, Kapoor and Schulberg30 created a system that helped screen patients for depression and then offered guideline-based treatment advice messages. In a study of 200 patients in primary care, this tool did not show a positive effect on patient outcomes at 3 or 6 months. One major technical limitation of this system was that the tool relied on research assistants to program advice messages, and these were not sent to the clinician during clinical encounters, which may have limited its utility. The Texas Medication Algorithm Project (TMAP) led to the development of a computerised version of its clinical algorithm, called CompTMAP, which assisted physicians in decisions such as adjusting doses, starting augmentation treatments and following patient progress in an expert guideline-informed manner.Reference Trivedi, Kern, Grannemann, Altshuler and Sunderajan31 This tool was tested in an unblinded study of 55 patients, where the group of patients treated using the CDSS showed improvement over standard of care in terms of patient depression symptoms.Reference Kurian, Trivedi, Grannemann, Claassen, Daly and Sunderajan32 More recently, Harrison et alReference Harrison, Carr, Goldsmith, Young, Ashworth and Fennema33 published a protocol for an upcoming study of a computerised decision support system implementing National Institute of Health and Care Excellence guidelines, which appears to, similarly to CompTMAP, take in patient information and suggest treatment approaches depending on treatment response and the relevant sections of the guidelines. Although all three of these systems offer the ability to screen patients, follow their response to treatment and suggest treatment course changes based on patient response and relevant guidelines (i.e. they support treatment management), none offer the ability to personalise treatment choice and differentiate between specific treatments based on an individual patient's profile (beyond making suggestions about when to alter treatment or add an augmenting agent, as per criteria set out by the guidelines). In the study of depressed inpatients carried out by Adli et al,Reference Adli, Wiethoff, Baghai, Fisher, Seemüller and Laakmann28 one arm of the study included a computerised system that did have some extent of prediction based on individual patient data: it used data from 650 patients to calculate probabilities of treatment failure or success during follow-up based on depression symptom scores for an individual patient, although it only provided general advice in response to this. For example, the authors state that the system could provide a recommendation that a physician review the treatment or consider an augmenting agent; as such, this system performed in a similar fashion to the guidelines (which already recommend treatment changes based on clinical improvement, or lack thereof, based on symptom scores at different points in treatment) and was outperformed by a more specific, structured clinical treatment algorithm. As such, no system before Aifred, to the best of our knowledge, combines the ability to implement clinical practice guidelines during patient encounters and patient follow-up (that is, optimising treatment management) with a machine-learning system that provides patient-and-drug specific remission probabilities (i.e. with a view to optimising personalised treatment selection). In this study, we focused on the most novel component offered by Aifred – this personalisation component – to determine how its integration into the information available to a clinician during a patient interaction, using a computerised CDSS, might affect the patient–clinician interaction, with a view to using this information to inform the conduct of future studies of this tool.
For the present study, the sample consisted of the intended end-users of the CDSS: psychiatry and family medicine attending staff and residents. Participants were recruited via email, social media and announcements, and were compensated. The recruitment target was 25 participants. Recruitment started roughly 3 months before study start. This study was approved by the Research Ethics Board of the Douglas Mental Health University Institute (ethical approval number: IUSMD 18-03). All participants, including standardised patients, provided written informed consent to participate. The study was conducted in accordance with the Tri-Council Statement on research ethics.
The study was conducted at the Steinberg Centre for Simulation and Interactive Learning. Each participant was present at the simulation centre for one 2.5-h session. The centre's one-way mirror system allowed research assistants to observe scenarios. The simulation centre has a roster of professional actors who play standardised patients (SPs). The ability of SPs to standardise their actingReference Beullens, Rethans, Goedhuys and Buntinx34,Reference Shirazi, Sadeghi, Emami, Kashani, Parikh and Alaeddini35 allows for multiple equivalent instances of the same clinical scenario to be run. Research assistants wrote observations on data extraction forms created for the study.
We created three clinical situations, corresponding to a mild, moderate and severe MDD. These situations were based on data from real patients drawn from the de-identified data-sets on which the model was trained. ‘Jack’ was a retired White male in his 80 s, suffering a mild depression marked by social withdrawal and sleep disturbance. He was experiencing some guilt about a previous divorce. ‘Emma’ was a White professional female in her 40 s, suffering from moderate depression marked by agitation and guilt about poor performance at work and with respect to being emotionally unavailable within her couple. ‘Sara’ was an Black female in her 50 s who had lost her job because of severe depression marked by psychomotor retardation and fatigue. She was prompted to come in to see the doctor by her friends in the building where she lives. The CDSS provided different remission probabilities per treatment for each patient.
Participants arrived in groups of up to six, and were given an introductory session that covered the current state of depression treatment, the rationale for the development of an artificial intelligence-powered tool, current results of the artificial intelligence model and an introduction to the user interface of the tool. They were told that the standardised patients were playing patients who had used the tool to fill out questionnaires in the ‘waiting room’, but had limited knowledge of the tool.
Participants were paired with a research assistant, who guided them through a 10-min training session with the CDSS on a laptop. Participants then filled out a questionnaire recording their initial impressions of the tool. Each participant then interacted with all three standardised patients in a random order in three 10-min clinical scenarios. During scenarios participants were free to interact with a laptop computer running the CDSS. The laptop was angled at 45 degrees toward the participant, but could be freely moved to face the standardised patient. The CDSS had access to questionnaire results as well as the treatment algorithm with its integrated artificial intelligence tool. Participants were warned that as scenarios were only 10 min long, they should consider starting to use the CDSS roughly halfway through; however, they were also told that they had the freedom to use or ignore the CDSS as they saw fit. After each scenario, participants filled out a questionnaire about their experience using the CDSS.
After the scenarios, there was a 10-min structured interview with a research assistant in which participants were able to elaborate further on their experience. They were then asked to complete an anonymous ‘exit’ questionnaire summarising their experience using the tool and their opinion of its impact on the physician–patient interaction. The last step was a 10-question surprise quiz on the CANMAT 2016 Guidelines for Depression Treatment, intended to establish participant knowledge of guidelines. After each testing day, an unstructured debriefing session was held with all standardised patients. Although standardised patient feedback is often not standardised, standardised patients have been shown to effectively assess clinical skills,Reference Kessler and Magee17,19,Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson and Ogawa20,Reference Beullens, Rethans, Goedhuys and Buntinx34,Reference Bokken, Linssen, Scherpbier, Vleuten and Rethans36,Reference Park, Son, Kim and May37 which motivated us to consider standardised patient feedback when assessing the impact of the tool on the clinician–patient interaction. See Fig. 1 for a flowchart of tasks participants completed during the study.
Description of tool development and decision to use simulation centre testing
The development pathway of the Aifred system is that of a medical device. The first steps involved needs assessments, discussions with stakeholders (such as physicians and patients) and the creation of a prototype, which was reviewed by independent experts (six psychiatrists). Then, in a process mirroring that described in Trivedi et al,Reference Trivedi, Kern, Grannemann, Altshuler and Sunderajan31 programming of the prototype into a functional application was overseen by the clinical authors working on the project and tested by them, fake patient data was input into the system to test and refine it, and then data from real patients (in our case, data from patients in the studies used to train the machine-learning system) was used for testing and the development of simulation scenarios. Concurrently, as in Trivedi et al,Reference Trivedi, Kern, Grannemann, Altshuler and Sunderajan31 field testing with physicians (ongoing at present) has been used to collect feedback on the design and clinical validity and utility of a version of the tool without the artificial intelligence enabled (as the version of the tool with artificial intelligence enabled is a medical device that must only be used as part of clinical trials and related studies). The fact that our tool includes a novel artificial intelligence/machine-learning component prompted further reflection on what studies were necessary to understand the impact of this novel component on the implementation of the CDSS. As a result, we decided we required a process evaluation, which, as discussed by Lamé and Dixon-Woods,Reference Lamé and Dixon-Woods3 involves taking a ‘look at how the intervention is implemented and received’ and can be carried out, among other options, using a simulation setting. Simulation centres are beneficial not only for clinical tool assessment during development, but for simulation of realistic patient outcomes: a recent systematic review and meta-analysis of 33 studies found that simulation-based assessments involving healthcare professionals using technology-enhanced simulation in the context of patient care have been found to correlate positively with patient-related outcomes.Reference Brydges, Hatala, Zendejas, Erwin and Cook38 However, the quality of methods and reporting have been insufficient, a limitation we aimed to address by standardising our methods to previous research. Our development of a simulation centre study to conduct our process evaluation mirrors closely the method described by Colman et alReference Colman, Doughty, Arnold, Stone, Reid and Dalpiaz39 for developing simulation-based testing for healthcare spaces: as noted, we began with stakeholder engagement and needs assessment, and discussed the project and the simulations with a multidisciplinary team including computer scientists, clinicians, patients and people with research skills in fields such as anthropology. Clinical scenarios were then developed based on real patient data and situations that were likely to be encountered by the end-users of the CDSS. Standardised patients were then trained; an advantage of using the Steinberg Simulation Centre was that the standardised patients were professional actors skilled at preparing and standardising their performances, using a standard training process managed by simulation centre staff.Reference Mueller, Cyr, Bank, Bhanji, Birnbaum and Boillat40 A testing day was then held as suggested by Colman et al,Reference Colman, Doughty, Arnold, Stone, Reid and Dalpiaz39 with run-throughs of patient scenarios, a walk-through of the simulation space and a review of all training documents prepared for the testing day. The testing days were then held, with standardised patient and staff debriefings occurring each day, as suggested by Colman et al,Reference Colman, Doughty, Arnold, Stone, Reid and Dalpiaz39 and this was then followed by data analysis and the creation of manuscripts for publication. We structured our analysis and reporting to assess some of the metrics of effective medical education as discussed by Dixon;Reference Dixon41 we chose medical education as a model given that the simulation centre experience did effectively act as a training session for use of the tool for physicians who participated. In this case, relevant areas of assessment as per Dixon,Reference Dixon41 were perception and opinion about the experience (often measured as satisfaction), knowledge or skills gained, and impact on clinical practice (with the latter only being inferred from responses clinicians gave about their likely future use of the tool).
Results are derived from the registration and exit questionnaires, unless otherwise noted, and comment on participant satisfaction, knowledge and skills gained, and potential impact on clinical practice. Note that these are initial selected results meant to illustrate the utility of the simulation centre; full study results will be reported separately.
Twenty participants completed the study. Participants were nearly evenly split between psychiatry (n = 11) and family medicine (n = 9), with a wide age range (24–67 years, mean age 39.5 years) and practice experience (6 residents and the following breakdown in experience for attending staff: 0–5 years: 4, 6–10 years: 2, 11–15 years: 2, 16–20 years: 4, ≥21 years: 2). The sample included participants practicing in hospital and community settings.
With respect to participant satisfaction and impact on the physician–patient interaction, 70% of participants felt that the artificial intelligence model assisted them in helping their patients better understand treatment (scoring ≥4 on a scale of 1–5, with higher values representing greater confidence). Sixty-five per cent of participants felt it helped improve patient trust in the treatment (scoring ≥4 on a scale of 1–5). Fifty per cent of participants felt that the application provided them with richer information to discuss with their patients (scoring ≥4 on a scale of 1–5). Forty-five per cent of participants reported that using the application made the interaction with patients feel less personal or that it interfered with their interview (scoring ≥4 on a scale of 1–5). Seventy per cent of participants felt the remission probabilities provided by the model were reasonable overall.
In terms of potential impact on clinical practice, 50% of participants thought they would use the CDSS for all of their patients with MDD, with an additional 40% (therefore 90% overall) stating they would use it for more complex or treatment-resistant patients. Sixty per cent of participants trusted that the artificial intelligence could help them choose treatments (scoring ≥4 on a scale of 1–5). Eighty per cent of participants felt that the information on the treatment selection page in the application (which indicated CANMAT-recommended treatments, their usual doses and the artificial intelligence predictions) contained information that was clinically useful (scoring ≥4 on a scale of 1–5). This suggests that the information contained in the tool could augment clinician knowledge during their interactions with patients. See Table 1 for a summary of results.
Before the simulation, 75% of participants reported that they would realistically use the application in clinic for 5 min or less during a session. Forty per cent of participants reported that the application would save them time (scoring ≥4 on a scale of 1–5), and 30% felt the application would neither save nor cost them time (scoring 3 on a scale of 1–5), indicating potential feasibility in a real, busy clinical environment. This was corroborated by the fact that, in the majority of scenarios, the participants were able to successfully navigate through the application within the short time provided. In a questionnaire administered right after each clinical scenario, 61.7% of participants reported that using the application ‘took some adjustment, but […] worked well’. Standardised patients provided valuable feedback, such as noting that some participants turned the computer screen toward them during the session, ‘inviting them in’ to engage with the tool. This seemed to be linked to acceptability of the tool's presence on the part of the standardised patients. They also commented on the importance of the clinician's manner and rapport building skills, such as warmth and ability to engage them in their care.
We will now reflect on the use of simulation for testing the effect of new technologies on the physician–patient interaction. Our initial results demonstrate that a majority of clinicians were satisfied with the use of the CDSS. At the end of the simulation, most clinicians could see themselves using the tool for at least a subset of their patients with depression, suggesting the feasibility of using the tool to achieve real-world impact. No major threats to the quality of the physician–patient interaction were identified, and we illustrated several ways in which the tool might enhance the interaction, as well as tools clinicians can use to better integrate the CDSS into a session.
Our sample of 20 participants was diverse with respect to career stage and practice environment, which increases confidence in the generalisability of our results. The sample size reflects recruitment feasibility. The largest barrier to recruitment was clinical duties and, for residents, concerns about not being released to participate. Being able to offer more testing days, as well as departmental approval for residents’ participation, may have increased recruitment. A challenge with simulation is that running participants in groups on predefined days is necessary given the need to ensure room and standardised patient availability.
Using simulation-based testing allowed us to observe interactions that would not have been easily accessible in other settings. As noted, some participants tended to turn the laptop toward their standardised patient. Standardised patients referred to this as participants ‘inviting them in’; this behaviour seemed to be important in determining their experience of the tool. Standardised patient feedback and our observations of sessions revealed that traditional aspects of the physician–patient interaction, such as clinician warmth, body language and ability to engage the patient, were also important in determining the standardised patient experience, suggesting that the impact of a new technology may depend on clinicians’ baseline ability to build rapport with their patients. This merits further investigation in a clinical environment. Self-report from clinicians also revealed important effects of the CDSS on the physician–patient interaction, such as the perceived utility of the tool in helping them better explain and increase trust in treatment. This interplay of observations of clinician behaviour, clinician self-report and standardised patient experiences provided fundamentally different information than would have been obtained through clinician self-report alone. These observations will influence clinician training provided in future clinical studies, resulting in more focus on how clinicians can engage the patient with the tool in-session and use it to provide more information and enhance patient trust.
External validity is a concern when using simulation-based testing.Reference Lamé and Dixon-Woods3 For example, several participants noted in written comments and during interviews that the 10-min training session was insufficient and that they would likely have become more comfortable with the CDSS with more time. However, external validity may depend on research aims.Reference Lamé and Dixon-Woods3 In our case, the aim was to see if the application was intuitive to use with minimal training, and, as noted, the majority of participants felt the tool took some adjustment but worked well. Similarly, the 10-min clinical session length was felt by multiple participants to be too short. We initially hypothesised that most clinicians would want a tool that they could use in 5 min, and this was supported by the finding that, at baseline, 75% of participants could see themselves using the tool for 5 min or less. Having short sessions, in which most participants used the tool in the latter half of the session, allowed us to determine that it is possible to use the tool in a meaningful way within this time constraint. As such, our research aims were well suited to simulation work.
The use of the simulation environment – and crucially, of standardised patients – to test the impact of technology on the physician–patient interaction is both practically useful and important as it allows direct observation of clinician interaction with a new tool before patient studies. This method provides multiple points of observation, allowing for an informative and multifaceted data-set that can inform the development of tools and training materials. Evaluating the ease with which new technology is used and integrated into clinical practice is a key step in the proper development and implementation of novel clinical tools, and is a useful prelude to more longitudinal studies on the impact of these tools on the clinician–patient relationship.
With respect to engaging patients in shared decision-making in the context of CDSS use, during this study physicians could have chosen any number of approaches. For example, they could have started by turning the screen toward the patient; kept the screen toward themselves while discussing the treatments and artificial intelligence results with the patient; or referred to the CDSS with the screen turned toward them, and then put it away and discussed treatment with the patient without explicitly discussing the CDSS and its results. The finding that standardised patients were more accepting of the tool when clinicians turned the screen toward them and ‘invited them in’ is not surprising in and of itself. However, it is instructive as it provides a concrete and simple behaviour that seems to have a significant impact on patient experience, the promotion of which can be included as part of the training for clinicians using the tool in the clinic or as part of coming clinical studies. It is also a finding that helps us determine which of the possible clinician behaviours in response to the tool would be most likely to be supportive of patients feeling engaged in decision-making. In addition, having actually observed the importance of this behaviour under simulation conditions may potentially help convince clinicians to adopt it.
In previous research, Trivedi et alReference Trivedi, Daly, Kern, Grannemann, Sunderajan and Claassen42 identified several barriers to implementation of a computerised decision support system. These included concerns about the time required to use the system in practice, technical challenges relating to computer literacy, and the need for physicians to be involved in the development of the tool and to have the ability to override system recommendations. Accordingly, we designed the tool to ensure physician autonomy by allowing physicians to select any treatment or action they deemed appropriate. We included physicians in the design and iterative ongoing design process, of which this simulation centre study is a part. And finally, we designed the tool to be easy to use quickly and intuitively during a patient encounter. As noted, physicians were able, after a short training session, to use the tool effectively within a short clinical encounter. This study aimed to assess, as part of the development of this tool, if we are on the right track in addressing some of the barriers previously noted in CDSS implementation; the present results provide preliminary indications that this is the case, which will be further assessed in a clinical feasibility study.
This study has a number of limitations and serves as only an initial step in the examination of the effect of this tool on the clinical process, with its main purpose being the identification of significant problems in the patient–clinician encounter when using the tool as well as the refinement of training materials for further clinical studies. In line with Dixon'sReference Dixon41 comments when discussing the validity of medical education evaluation, one cannot assume that changes in physician knowledge or skills, or satisfaction with the training or the tool itself, will directly lead to improved patient outcomes; furthermore, one cannot assume that the impressions physicians had of the tool with respect to its potential effect on their practice would be borne out once they begin using it in clinic. The present study does, however, help establish that physicians seem open to trying this tool in clinic, that they can be easily trained to use it in a manner they find satisfactory and that there is some agreement among physicians that the tool has potential clinical utility. As such, the next step will be to conduct a feasibility study of the tool in a longitudinal manner in clinic, followed by a randomised control study aimed at assessing tool effectiveness and safety. The largest drawback of this simulation centre study, with respect to assessment of the effect of the tool on the patient–clinician interaction, is that it is impossible, in this setting, to assess longitudinal effects on the patient–clinician relationship, hence raising the importance of conducting a longitudinal feasibility study in clinic before large-scale clinical trials.
Although this study does not evaluate the effectiveness of this tool, it has provided valuable insights into how clinicians may use this type of tool and how the tool, and the training provided to clinicians who use it, may be further developed to increase the chance that it will have a positive impact on patient care.
We would like to acknowledge the Steinberg Centre for Simulation and Interactive Learning for the helpfulness of their staff in assisting with the execution of this study, as well as the standardised patients who participated in the study, for their excellence and the quality of their feedback.
Use of the simulation centre and the work of the standardised patients was provided as part of the prize for a clinical innovation competition run by McGill University and the Steinberg Centre for Simulation and Interactive Learning, Canada, with the generous support of the Hakim family. Research assistants, software and participant compensation was provided by Aifred Health. The Canadian Federal Government's Youth Employment Program also provided a grant to M.T.-S. to support this work (grant number: 933792).
D.B. worked on conceptualizing and running the study and writing the protocol, oversaw analysis, and worked on creating and revising the draft manuscript. M.T.-S. worked on running the study, organizing research assistants, oversaw analysis, and worked on creating and revising the draft manuscript. K.P. and S.I. helped conceptualize the study and write the protocol, collected data and helped revise the manuscript. K.P. also contributed to data analysis. J.M., C.A. and R.F. created, in collaboration with D.B., the AI model tested in the study and designed how it would report information. They also helped revise the manuscript. S.V.P., J.F.K., K.H. and I.V.V. provided comments on study protocol and measures, provided guidance on the analysis, and helped significantly revise the manuscript. In addition, S.V.P. originated the idea for this format of manuscript and provided a number of references. J.F.K. provided part of the data used to train the AI model. S.K., S.N.V., G.M., R.M. and D.M.B. helped assess the clinical validity of the treatment algorithm into which the AI was inserted and helped revise the manuscript. C.P., E.L., M.W., J.W., G.S., T.P., J.-F.T. and K.R. were all research assistants who assisted in data collection and analysis, and revised the manuscript. C.R. was a research assistant who helped write the original protocol, assisted with data analysis and revised the manuscript. E.S. was a research assistant who assisted with data analysis and revising the manuscript. M.M. and G.T. assisted in the development of the original protocol, provided research questions for data analysis and also helped revise the manuscript. L.G.C. and O.L. provided comments on the data analysis and helped revise the manuscript. H.C.M. helped conceptualize the study and produce the original protocol, and oversaw data analysis. He also significantly revised the manuscript.
Declaration of interest
D.B., M.T.-S., K.P., S.I., J.M., C.A., R.F., C.R. and M.M. are shareholders, employees or directors of Aifred Health. C.P., E.L., E.S., M.W., J.W., G.S., T.P. and K.R. were research assistants paid by Aifred Health. S.V.P., J.F.K. and K.H. are members of Aifred Health's scientific advisory board and either have or may in the near future received shares in the company. H.C.M. has received honoraria, sponsorship or grants for participation in speaker bureaus, consultation, advisory board meetings and clinical research from Acadia, Amgen, HLS Therapeutics, Janssen-Ortho, Mylan, Otsuka-Lundbeck, Perdue, Pfizer, Shire and SyneuRx International. All other authors report no relevant conflicts.
eLettersNo eLetters have been published for this article.