In 1990 Jenkins identified an urgent need for a system of indicators to enable clinicians to monitor and evaluate mental healthcare.Reference Jenkins1 One reason identified for not routinely using standard outcome measures was the lack of appropriate instruments.Reference Slade, Thornicroft and Glover2
In 1998, Wing and colleaguesReference Wing, Beevor, Curtis, Park, Hadden and Burns3 developed the Health of the Nation Outcome Scales (HoNOS), an instrument covering symptoms, functioning, relationships and environmental issuesReference Laugharne, Eaves, Mascas, Psatha, Dinnis and Trower4,Reference Kisely, Campbell, Cartwright, Cox and Campbell5 that could be used routinely in the National Health Service (UK) to measure progress towards the target set by the Department of Health in the UK ‘to improve significantly the health and social functioning of mentally ill people’.6 Since then, the HoNOS and its adaptations for children and adolescents (HoNOSCA) and for those over 65 years of age (HoNOS65+) have been officially adopted in England, Australia, New ZealandReference James, Painter, Buckingham and Stewart7 and in other European countries.Reference Lovaglio and Monzani8–Reference Bilenberg10
Gowers et alReference Gowers, Harrington, Whitton, Lelliott, Beevor and Wing11 developed the HoNOSCA, for children and adolescents, as a set of scales to be used in child and adolescent mental health services.Reference Gowers, Bailey-Rogers, Shore and Levione12 The HoNOSCA has been widely used.Reference Laugharne, Eaves, Mascas, Psatha, Dinnis and Trower4,Reference Hanssen-Bauer, Gowers, Aalen, Bilenberg, Brann and Garralda13–Reference Brann, Alexander and Coombs19 It was designed to be brief, have a similar structure to the HoNOS and provide a broad, quantitative measure of severity, with sound psychometric properties, to measure a range of behavioural, symptomatic, social and impairment domains in children and adolescents.Reference Gowers, Harrington, Whitton, Lelliott, Beevor and Wing11,20 HoNOSCA is most appropriately applied to those over 4 years old.21
Developing the HoNOSI
In Australia, the National Outcomes and Casemix Collection (NOCC) was introduced in the early 2000s ‘to provide a suite of measures that support clinical practice and comparisons across services and different consumer populations’.Reference Brann, Alexander and Coombs19 The Strategic Directions 2014–2024 report22 on the NOCC implementation and its future direction identified a gap in outcome measures for infants and pre-schoolers. The Australian Child and Adolescent Mental Health Information Development Expert Advisory Panel (CAMHIDEAP) provides advice on routine outcome measures and on information initiatives to the states, territories and the Commonwealth Government.23 CAMHIDEAP developed the Health of the Nation Outcome Scales for Infants (HoNOSI)24 as a routine outcome measure for clinicians working with the emotional and social well-being of children in the 0- to 47-month age group.
The HoNOSI arose out of an international collaboration around the reliability of the HoNOSCA.Reference Hanssen-Bauer, Gowers, Aalen, Bilenberg, Brann and Garralda13 CAMHIDEAP decided the HoNOSI would parallel the structure of the HoNOSCA. A similar approach to ratings, number of scales, time frames and sources of information was considered to facilitate acceptance by clinicians who may work with both instruments. It could reduce training time. A key strategic consideration was that the adoption of a new outcome measure (especially across a nation) involves substantial financial costs associated with database development and maintenance. A similar structure would only require the addition of a ‘version’ flag, a relatively inexpensive approach, in order for HoNOSI ratings to be recorded and extracted from the existing HoNOSCA data space.
The content of the 15 scales was initially developed by Dr Sally Merry of New Zealand, with in principle support from key figures of the HoNOSCA reliability collaboration from the UK, Denmark, Norway and Australia. Dr Merry, with support from infant and child mental health colleagues, either paralleled HoNOSCA scales where appropriate, or replaced them with more developmentally appropriate areas of concern. Continuity of outcomes could be assisted by maintaining both structural similarity and maximising content overlap where appropriate.
Face validity testing25 showed that the HoNOSI fulfilled a much-needed gap in infant mental health outcome measurement for the 0- to 47-month age group as no suitable instrument previously existed. Following face validity testing, the CAMHIDEAP working group identified the need for field testing to test selected psychometric properties of the HoNOSI.
The HoNOSI field trial was designed to examine concurrent validity – how well HoNOSI ratings correlate with other measures of similar constructs.
The CAMHIDEAP nominated key clinicians across a range of Australian states who were engaged in providing mental health services to infants and pre-schoolers. These key clinicians approached their own and allied services that provided infant mental health services. Many of these clinicians had previously been involved in the face validity study.25 Services from states previously participating in the face validity study were invited to participate from Queensland, New South Wales, Victoria and South Australia.
Concurrent validity was assessed with patients in routine clinical care by comparing clinician's ratings on the HoNOSI against the Parent-Infant Relationship Global Assessment Scale (PIR-GAS), Clinical Worry Rating scale and Severity Judgement Rating scale (see Psychometric properties tested section below for further details). Each infant was rated by one clinician on each of these four measures. Data was collected from five participating services across four states. Participants were given an overview of the study, including rationale, background and aims and were provided with an information sheet that they were asked to read, before signing a consent form in order to be able to participate. Site coordinators emphasised that the clinician's information would remain confidential and be analysed in aggregate, anonymous form only. Copies of the study protocol were included with the study material for co-ordinators’ and participants’ reference.
Instructions for using the HoNOSI, PIR-GAS, Clinical Worry and Severity Judgement Rating scales were included with the instruments. Additional background material and principles for rating as well as the glossary for each scale were incorporated into the HoNOSI. Participants were encouraged to ask any questions of the site co-ordinator or the project co-ordinator.
Signed consent forms and completed ratings were returned via courier to the Health Education and Training Institute for data input and analysis. In the rare event that the clinician returned a completed rating scale(s) without having completed a consent form, consent was implied via the participant's return of the completed outcome rating. No information was sought from children, infants or parents. Ethics and site-specific approval were obtained from the respective Ethics and Research and Governance Offices within each participating state – New South Wales, Queensland, Victoria and South Australia.
Psychometric properties tested
The COnsensus based Standards for the selection of health Measurement INstruments (COSMIN)26 initiative was developed to provide guidance on the selection of high-quality patient-reported outcome measures to clinical and research applications.Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet27 This includes providing a methodology for assessing the content validity of patient-reported outcome measures.26,Reference Mokkink, Prinsen, Patrick, Alonso, Bouter and de Vet28 It comprises a taxonomy and definitions of measurement properties,Reference Mokkink, Terwee, Patrick, Alonso, Stratford and Knol29 checklists for assessing the methodological quality of measurement propertiesReference Mokkink, de Vet, Prinsen, Patrick, Alonso and Bouter30 and criteria for good measurement properties, against which to evaluate study results.Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet27
The minimum ‘acceptable’ COSMIN standard for internal consistency, or the degree of interrelatedness among items, is 0.70.Reference Prinsen, Mokkink, Bouter, Alonso, Patrick and de Vet27 The COSMIN standard for assessing concurrent validity, or the correlation of the measure of interest with a ‘gold standard’ is 0.70.Reference Mokkink, Terwee, Patrick, Alonso, Stratford and Knol29 The ‘gold standard’ is another measure, or set of measures, that assesses a similar construct.
As no single gold standard measure of clinician-rated mental health symptoms and functioning existed at that time,31 the working group determined that the best comparison available was to test HoNOSI24 against the Parent-Infant Relationship Global Assessment Scale,32 widely used in Germany,Reference Greve, Muller, Albers, Romer and Achtergarde33,Reference Muller, Achtergarde, Frantzmann, Steinberg, Skorozhenina and Beyer34 Denmark,Reference Skovgaard, Houmann, Christiansen, Olsen, Landorph and Lichtenberg35 the USA32,Reference Thomas and Clark36 and Australia.37 As HoNOSI also covers symptom severity and perceived distress as well as functioning, two simple scales, developed by the project working group (Clinical Worry and Severity Judgement)31 were also rated.
The HoNOSI24 contains 15 single-item scales that address a range of symptoms and functioning that can occur in the infant-to-pre-school age range (see Appendix). Each scale is accompanied by a glossary outlining the range of issues covered and is rated on a 0–4-point scale ranging from ‘No problem’ to ‘Severe problem’. The guidelines allow clinicians to include all sources of information when making a rating and do not simply presume that any difficulty is located exclusively within the infant. In parallel with the HoNOSCA, the first 13 scales cover clinical areas and are summed to form a total score. Missing data is treated as zero in calculating totals. Scales 14 and 15 focus on information about the situation38 and do not contribute to the total score.
Clinicians’ ratings on the HoNOS family of measures can be categorised as ‘clinically significant’ if a problem area is rated as mild, moderate or severe to very severe problem (i.e. a rating of 2, 3 or 4) or ‘clinically not significant’ for ratings of 0 or 1.Reference Burgess, Trauer, Coombs, McKay and Pirkis39 Full details on study procedures and HoNOSI scoring instructions are available in the HoNOSI field trial report.31
The PIR-GAS32 is a measure of the quality of the parent–infant relationship.32 Clinicians assess the intensity, frequency and duration of difficulties on a 100-point rating scale, usually reported in deciles, that ranges from 1–10 Documented Maltreatment to 91–100 Well Adapted.
The Clinical Worry Rating,31 a seven-point rating scale, developed by the HoNOSI project working group, asks the clinician to rate: ‘Overall, how concerned are you about this infant?’. The Severity Judgement Rating,31 also a seven-point rating scale developed by the working group, asks the clinician to rate: ‘In your clinical judgement, how severe do you consider the infant's overall social and emotional problems?’. Both the Clinical Worry Rating and the Severity Judgement Rating scales were designed to be unidirectional, from 0 (Not worried/No problem) to 6 (Extreme/Severe).
Based on the directionality of the measures, the previously discussed COSMIN standards indicate that adequate concurrent validity would be achieved if the HoNOSI had a statistically significant correlation of at most negative 0.70 with PIR-GAS and at least 0.70 with the Clinical Worry and the Severity Judgement Rating scales.
Data collection and analysis
Data was collected from five participating services within four states across Australia. The analysis dataset consisted of 108 completed clinical cases. A HoNOSI ‘item severity structure’ index was derived using the method described by Gowers et al (1999)Reference Gowers, Harrington, Whitton, Lelliott, Beevor and Wing11 with respect to the HoNOSCA. Statistical analyses were performed using SPSS Version 2440 and Stata Version 14.2.41
Using a combination of jurisdiction, profession and years’ experience, it is estimated that 26 clinicians participated. The number of infants rated varied from 1 to 14 with a mode of 3.5. The number of infants rated across the five sites ranged from 6 to 55 with two services being responsible for 78% of the infants rated. All statistical analyses used a type error rate of α < 0.05 and their associated probability are reported.
Over half of the clinicians were either psychologists or social workers and these two professions completed approximately two-thirds of all ratings. Table 1 shows the estimated number of clinicians and the number of ratings completed by profession type.
Over 61% of clinicians rating the cases had over 5 years’ experience and of those, 23% had clinical experience of over 10 years; these clinicians rated more than 75% of the cases. Table 2 shows the estimated number of clinicians and the number of ratings completed by the clinicians’ years of experience.
Basic demographic data were collected regarding the age and gender of the infant. There were slightly more male (52.8%) than female infants (47.2%). The age distributions differed; male infants were somewhat older than female infants, with median ages of 16 and 10 months, respectively. Table 3 shows the age distribution of the infants.
On PIR-GAS, some infants were classified as Adapted Relationship but most had Features of a Disordered Relationship (PIR–GAS rating 41–80; 49.1%) or a Disordered Relationship (PIR–GAS rating 1–40; 42.3%).32 Table 4 shows the distribution of PIR-GAS ratings.
a. There were four non-responses to the PIR-GAS.
The majority of infants were rated as either a ‘3’ or a ‘4’ in terms of Clinical Worry (51.0%) and Severity Judgement (43.6%) on their respective seven-point rating scales, where zero denotes Not Concerned and No Problem and a score of six denotes Extremely Concerned and Extremely Severe Problem on the Clinical Worry and Severity Judgement Rating scales, respectively (Table 5).
Distribution of HoNOSI ratings
The frequency distribution of severity ratings for each of the 15 HoNOSI scales is presented in Fig. 1.
There were no missing HoNOSI ratings and no infants received a rating of Not known/Not applicable. Of the 108 cases, all five rating points were used for 12 of the 15 HoNOSI scales; the most severe rating of four was not used for Scale 6 Problems with physical illness or disability, Scale 7 Problems associated with regulation and integration of sensory processing and Scale 11 Problems with age appropriate self-care and environmental exploration. For six scales, over half of all ratings were rated zero indicating ‘No problems/Issues’ (Scale 3 Non-accidental self-injury or lack of self-protective behaviours, Scale 5 Problems with developmental delays, Scale 6 Problems with physical illness or disability, Scale 7 Problems associated with regulation and integration of sensory processing, Scale 11 Problems with age appropriate self-care and environmental exploration and Scale 13 Problems with attending care, education and socialisation settings).
Using the HoNOS family of measures classification where a rating of 2, 3 or 4 is classified as clinically significant, with respect to the 108 cases, 75% were rated as having clinically significant problems with Scale 12 Problems with family life and relationships, 55% with Scale 9 Problems with emotional and related symptoms or over-controlled emotional regulation, 52% with Scale 1 Problems with disruptive behaviour/irritability/under controlled emotional regulation and 51% with Scale 8 Problems associated with sleep. The scales least frequently rated as clinically significant were Scale 3 Non-accidental self-injury or lack of self-protective behaviours (10%), Scale 6 Problems with physical illness or disability (10%) and Scale 7 Problems associated with regulation and integration of sensory processing (14%).
Clinically significant problems were also found for 61% on Scale 14 Problems with knowledge or understanding about the nature of the infant's difficulties and for 43% on Scale 15 Problems with lack of information, understanding about services, or managing the infant's difficulties.
Figure 2 presents the distribution of HoNOSI total scores (sum of the ratings of the first 13 scales). HoNOSI total scores ranged from 0 through 42, with a mean and median of 14.0 and an interquartile-range of 12 points (i.e. the middle 50% of total scores were within the range 7 through 19) (Table 6). Analysis of the distribution of the total scores did not reveal any significant deviation from normality.
The level of internal consistency of the 13 scales comprising the total score, as measured by Cronbach's ɑ is 0.87.
Collins et al (2016) address floor and ceiling effects for measures reporting total scores as ‘the percentage of respondents with the lowest possible score (floor effects) and the highest possible score (ceiling effects)’.Reference Collins, Prinsen, Christensen, Bartels, Terwee and Roos42 Floor and ceiling effects are not considered statistically significant if less than 15% of participants score the lowest or the highest possible score. There was no evidence of these effects in the HoNOSI total scores (Fig. 2). Only six cases (5.6%) had a HoNOSI total score of zero and only one case had a HoNOSI total score of 42 (0.9%).
In terms of item severity structure, it is important to note that more than 80% of cases had at least one HoNOSI scale problem area rated as clinically significant. Table 7 shows details on the item severity structure index by the mean HoNOSI total score.
Spearman's rank order correlation was used to test the concurrent validity of the HoNOSI total score with the PIR-GAS, Clinical Worry and Severity Judgement Rating scales and the results are presented in Table 8. A comprehensive intercorrelation analysis of the 15 individual HoNOSI scales is available in Supplementary Table 1 available at https://doi.org/10.1192/bjo.2021.951. It shows that that all 15 HoNOSI scale correlations with the PIR-GAS, Clinical Worry and Severity Judgement Ratings are statistically significant (P < 0.001) with the one exception of Scale 6 correlated against the Clinical Worry Rating, which is also statistically significant at a lower threshold (P < 0.05). The HoNOSI total score correlations summary table (Table 8) shows the three validity measures correlated against the HoNOSI total score. It is also important to note that the three concurrent validity measures are highly statistically intercorrelated (P < 0.001): PIR-GAS with Clinical Worry, rs = −0.81; PIR–GAS with Severity Judgement, rs = −0.76; and Clinical Worry with Severity Judgement, rs = 0.81.
a. All correlations were statistically significant (P < 0.001).
This study was designed specifically to establish the level of evidence of concurrent validity with respect to the 15 HoNOSI scales and the HoNOSI total severity score. In order to test concurrent validity, in the absence of a gold standard, the HoNOSI was compared with other measures that measure similar constructs: the PIR-GAS, Clinical Worry and Severity Judgement Rating scales.
The level of internal consistency of the 13 scales comprising the total score, as measured by Cronbach's ɑ, is 0.87 which well exceeds the COSMIN threshold. It should also be noted that the evaluation of the concurrent validity of HoNOSI was based on three independent measures. The three concurrent validity measures are highly statistically intercorrelated, suggesting a high degree of construct congruence. Using the COSMIN criteria, there is evidence for HoNOSI having ‘adequate’ concurrent validity, as assessed by correlations with the PIR–GAS, Clinical Worry and Severity Judgement Rating scales.
More than 80% of cases had at least one HoNOSI scale problem area rated as clinically significant. This finding suggests that the overall clinical severity of these 108 cases is likely representative of very young consumers seen in specialised public sector mental health services. This was not a sample that was symptom free.
Examining ratings of individual scales, no infants received a rating of Not known, nor were there any missing ratings. This suggests that all 15 scales were able to be used. The most severe rating of 4 was not used for three of the 15 scales: Scale 6 Problems with physical illness or disability, Scale 7 Problems associated with regulation and integration of sensory processing and Scale 11 Problems with age appropriate self-care and environmental exploration. It could be in this sample of 108 infants, that there were no cases with Severe to very severe problem for the HoNOSI problem areas. Alternatively, it could be that the glossary for these scales means that it is unlikely that a rating of 4 would be used. Future work could further explore these particular scales in another sample.
Future research could also explore HoNOSI validity with respect to other domains and consumer attributes including the specific nature of presenting problems and diagnostic categories. There was relatively brief written training, embedded in HoNOSI, provided in this study. Although this brief approach may be seen to mirror what clinicians receive in real-world settings post any initial implementation,Reference Brann, Coleman and Luk43 the impact of additional training on the performance of HoNOSI would be worth exploring. With the adult routine outcome measure, the type of training required, and its capacity to improve psychometric, and clinician, performance is an area of longstanding debate.Reference Rock and Preston44,Reference Coombs, Trauer and Eagar45
There are other psychometric properties (for example, sensitivity to change) yet to be investigated. A face validity study,25 an interrater reliability studyReference Brann, Culjak, Kowalenko, Dickson, Coombs and Sved Williams46 and this concurrent validity field trial have now been completed. The findings have been sufficiently encouraging to support controlled implementation of the HoNOSI.
Supplementary material is available online at https://doi.org/10.1192/bjo.2021.951.
The study was undertaken by the Child and Adolescent Mental Health Information Development Expert Advisory Panel (CAMHIDEAP), supported by the Australian Mental Health Outcomes and Classification Network (AMHOCN). CAMHIDEAP and AMHOCN have been funded by the Australian Government Department of Health through a contract with the Health Education and Training Institute.
The data that support the findings of this study are available from the corresponding author (G.C.) upon reasonable request.
All authors contributed to the conception and design of the study. G.C. conducted the field trial, initial analysis, interpretation of data and wrote the draft manuscript. G.C. and P.B. conducted the final analysis and prepared the final manuscript. All authors reviewed and approved the final version of the manuscript.
Declaration of interest
Health of the Nation Outcome Scales for Infants