Comparison of three instruments assessing the quality of economic evaluations: A practical exercise on economic evaluations of the surgical treatment of obesity

  • Sophie Gerkens (a1), Ralph Crott (a2), Irina Cleemput (a3), Jean-Paul Thissen (a4), Marie-Christine Closon (a1), Yves Horsmans (a4) and Claire Beguin (a2)...


Objectives: The increasing use of full economic evaluations has led to the development of various instruments to assess their quality. The purpose of this study was to compare the frequently used British Medical Journal (BMJ) check-list and two new instruments: the Consensus Health Economic Criteria (CHEC) list and the Quality of Health Economic Studies (QHES) instrument. The analysis was based on a practical exercise on economic evaluations of the surgical treatment of obesity.

Methods: The quality of nine selected studies was assessed independently by two health economists. To compare instruments, the Spearman rank correlation coefficient was calculated for each assessor. Moreover, the test–retest reliability for each instrument was assessed with the intraclass correlation coefficient (ICC) (3,1). Finally, the inter-rater agreement for each instrument was estimated at two levels: comparison of the total score of each article by the ICC(2,1) and comparison of results per item by kappa values.

Results: The Spearman's rank correlation coefficient between instruments was usually high (rho > 0.70). Furthermore, test–retest reliability was good for every instruments, that is, 0.98 (95 percent CI, 0.86–0.99) for the BMJ check-list, 0.97 (95 percent CI, 0.73–0.98) for the CHEC list, and 0.95 (95 percent CI, 0.75–0.99) for the QHES instrument. However, inter-rater agreement was poor (kappa < 0.40 for most items and ICC(2,1) ≤ 0.5).

Conclusions: The study shows that the results of the quality assessment of economic evaluations are not so much influenced by the instrument used but more by the assessor. Therefore, quality assessments should be performed by at least two independent experts and final scoring based on consensus.



