Skip to main content Accessibility help
  • Print publication year: 2014
  • Online publication date: October 2014

11 - Estimation, calibration, and validation


Essentially, all models are wrong, but some are useful.

George E. P. Box


As discussed in Chapter 8, ‘good decision analyses depend on both the veracity of the decision model and the validity of the individual data elements.’ The validity of each individual data element relies on the comprehensiveness of the literature search for the best and most appropriate study or studies, criteria for selecting the source studies, the design of the study or studies, and methods for synthesizing the data from multiple sources. Nonetheless, Sir Michael David Rawlins avers that ‘Decision makers have to incorporate judgements, as part of their appraisal of the evidence, in reaching their conclusions. Such judgements relate to the extent to which each of the components of the evidence base is “fit for purpose.” Is it reliable?’(1) Because the integration of a multitude of these ‘best available’ data elements forms the basis for model results, some individuals refer to decision analyses as black boxes, so this last question applies particularly to the overall model predictions. Consequently, assessing model validity becomes paramount. However, prior to assessing model validity, model construction requires attention to parameter estimation and model calibration. This chapter focuses on parameter estimation, calibration, and validation in the context of Markov and, more generally, state-transition models (Chapter 10) in which recurrent events may occur over an extended period of time. The process of parameter estimation, calibration, and validation is iterative: it involves both adjustment of the data to fit the model and adjustment of the model to fit the data.

Parameter estimation

Survival analysis involves determining the probability that an event such as death or disease progression will occur over time. The events modeled in survival analysis are called ‘failure’ events, because once they occur, they cannot occur again. ‘Survival’ is the absence of the failure event. The failure event may be death, or it may be death combined with a non-fatal outcome such as developing cancer or having a heart attack, in which case the absence of the event is referred to as event-free survival. Commonly used methods for survival analysis include life-table analysis, Kaplan–Meier product limit estimates, and Cox proportional hazards models. A survival curve plots the probability of being alive over time (Figure 11.1).

Rawlins, M. De testimonio: on the evidence for decisions about the use of therapeutic interventions. Lancet. 2008;372(9656):2152–61.
Bradburn, MJ, Clark, TG, Love, SB, Altman, DG. Survival analysis part II: multivariate data analysis – an introduction to concepts and methods. Br J Cancer. 2003;89(3):431–6.
Kleinbaum, DG, Klein, M. Survival Analysis: A Self-learning Text. 2nd edn: Springer; 2005.
Beck, JR, Kassirer, JP, Pauker, SG. A convenient approximation of life expectancy (the “DEALE”). I. Validation of the method. Am J Med. 1982;73(6):883–8.
Cuchural, GJ, Levey, AS, Pauker, SG. Kidney failure or cancer. Should immunosuppression be continued in a transplant patient with malignant melanoma?Med Decis Making. 1984;4(1):82–107.
Arias, E. United States Life Tables, 2008. National Vital Statistics Reports. Hyattsville, MD: National Center for Health Statistics; 2012.
Kuntz, KM, Weinstein, MC. Life expectancy biases in clinical decision modeling. Med Decis Making. 1995;15(2):158–69.
Weinstein, MC. Recent developments in decision-analytic modelling for economic evaluation. Pharmacoeconomics. 2006;24(11):1043–53.
Barton, P, Jobanputra, P, Wilson, J, Bryan, S, Burls, A. The use of modelling to evaluate new drugs for patients with a chronic condition: the case of antibodies against tumour necrosis factor in rheumatoid arthritis. Health Technol Assess. 2004;8(11):iii, 1–91.
Mark, DB, Hlatky, MA, Califf, RM, et al. Cost effectiveness of thrombolytic therapy with tissue plasminogen activator as compared with streptokinase for acute myocardial infarction. N Engl J Med. 1995;332(21):1418–24.
Eckman, MH, Rosand, J, Greenberg, SM, Gage, BF. Cost-effectiveness of using pharmacogenetic information in warfarin dosing for patients with nonvalvular atrial fibrillation. Ann Intern Med. 2009;150(2):73–83.
Vanni, T, Karnon, J, Madan, J, et al. Calibrating models in economic evaluation: a seven-step approach. Pharmacoeconomics. 2011;29(1):35–49.
Kim, JJ, Kuntz, KM, Stout, NK, et al. Multiparameter calibration of a natural history model of cervical cancer. Am J Epidemiol. 2007;166(2):137–50.
Taylor, DC, Pawar, V, Kruzikas, D, et al. Calibrating longitudinal models to cross-sectional data: the effect of temporal changes in health practices. Value Health. 2011;14(5):700–4.
Taylor, DC, Pawar, V, Kruzikas, DT, et al. Incorporating calibrated model parameters into sensitivity analyses: deterministic and probabilistic approaches. Pharmacoeconomics. 2012;30(2):119–26.
Taylor, DC, Pawar, V, Kruzikas, D, et al. Methods of model calibration: observations from a mathematical model of cervical cancer. Pharmacoeconomics. 2010;28(11):995–1000.
Kong, CY, McMahon, PM, Gazelle, GS. Calibration of disease simulation model using an engineering approach. Value Health. 2009;12(4):521–9.
Rutter, CM, Miglioretti, DL, Savarino, JE. Bayesian calibration of microsimulation models. J Am Stat Assoc. 2009;104(488):1338–50.
Stout, NK, Knudsen, AB, Kong, CY, McMahon, PM, Gazelle, GS. Calibration methods used in cancer simulation models and suggested reporting guidelines. Pharmacoeconomics. 2009;27(7):533–45.
Garnett, GP, Cousens, S, Hallett, TB, Steketee, R, Walker, N. Mathematical models in the evaluation of health programmes. Lancet. 2011;378(9790):515–25.
Nijhuis, RL, Stijnen, T, Peeters, A, et al. Apparent and internal validity of a Monte Carlo-Markov model for cardiovascular disease in a cohort follow-up study. Med Decis Making. 2006;26(2):134–44.
van Kempen, BJ, Ferket, BS, Hofman, A, et al. Validation of a model to investigate the effects of modifying cardiovascular disease (CVD) risk factors on the burden of CVD: the Rotterdam ischemic heart disease and stroke computer simulation (RISC) model. BMC Med. 2012;10:158.
Fryback, DG, Stout, NK, Rosenberg, MA, et al. The Wisconsin Breast Cancer Epidemiology Simulation Model. J Natl Cancer Inst Monogr. 2006;(36):37–47.
Chia, YL, Salzman, P, Plevritis, SK, Glynn, PW. Simulation-based parameter estimation for complex models: a breast cancer natural history modelling illustration. Stat Methods Med Res. 2004;13(6):507–24.
Wong, JB, Koff, RS. Watchful waiting with periodic liver biopsy versus immediate empirical therapy for histologically mild chronic hepatitis C. A cost-effectiveness analysis. Ann Intern Med. 2000;133(9):665–75.
Provenzale, D, Schmitt, C, Wong, JB. Barrett’s esophagus: a new look at surveillance based on emerging estimates of cancer risk. Am J Gastroenterol. 1999;94(8):2043–53.
Whyte, S, Walsh, C, Chilcott, J. Bayesian calibration of a natural history model with application to a population model for colorectal cancer. Med Decis Making. 2011;31(4):625–41.
Kennedy, MC, O’Hagan, A. Bayesian calibration of computer models. J R Stat Soc: Series B (Statistical Methodology). 2001;63(3):425–64.
Kim, JJ, Ortendahl, J, Goldie, SJ. Cost-effectiveness of human papillomavirus vaccination and cervical cancer screening in women older than 30 years in the United States. Ann Intern Med. 2009;151(8):538–45.
Eddy, DM, Hollingworth, W, Caro, JJ, et al. Model transparency and validation: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-7. Med Decis Making. 2012;32(5):733–43.
Roberts, M, Russell, LB, Paltiel, AD, et al. Conceptualizing a model: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-2. Med Decis Making. 2012;32(5):678–89.
Hunink, MG, Goldman, L, Tosteson, AN, et al. The recent decline in mortality from coronary heart disease, 1980–1990. The effect of secular trends in risk factors and treatment. JAMA. 1997;277(7):535–42.
Wong, JB. Pharmacogenomics of hepatitis C and decision analysis: a glimpse into the future. Hepatology. 2002;36(1):252–4.
Zauber, AG, Lansdorp-Vogelaar, I, Knudsen, AB, et al. Evaluating test strategies for colorectal cancer screening: a decision analysis for the U.S. Preventive Services Task Force. Ann Intern Med. 2008;149(9):659–69.
Berry, DA, Cronin, KA, Plevritis, SK, et al. Effect of screening and adjuvant therapy on mortality from breast cancer. N Engl J Med. 2005;353(17):1784–92.
Drummond, MF, Barbieri, M, Wong, JB. Analytic choices in economic models of treatments for rheumatoid arthritis: What makes a difference?Med Decis Making. 2005;25(5):520–33.
Turner, D, Raftery, J, Cooper, K, et al. The CHD challenge: comparing four cost-effectiveness models. Value Health. 2011;14(1):53–60.
Kim, LG, Thompson, SG. Uncertainty and validation of health economic decision models. Health Econ. 2010;19(1):43–55.