## Appendices

**A. Integration**

We need to evaluate the integrals in equations (1–3). There are several approaches which could be adopted when the function to be integrated can be evaluated at any point, such as adaptive quadrature; see Press *et al*. (2005, page 133) for details of this and other methods. However, since we only have data at integer ages, *
*_{t}p
_{
x,y
} can only be calculated at equally spaced grid points. Since we cannot evaluate the function to be integrated at any point we like, we maximise our accuracy by using the following approximations.

For two points separated by one year we use the Trapezoidal Rule:

For three points spaced 1 year apart we use Simpson’s Rule:

For four points spaced one year apart we use Simpson’s 3/8 Rule:

For five points spaced 1 year apart we use Boole’s Rule:

To integrate over *n* equally spaced grid points we first apply Boole’s Rule as many times as possible, then Simpson’s 3/8 Rule, then Simpson’s Rule and then the Trapezoidal Rule for any remaining points at the highest ages.

**B. Corner Cohorts**

One issue with the APC and APCI models is that the cohort terms can have widely varying numbers of observations, as illustrated in Figure A1; at the extremes, the oldest and youngest cohorts have just a single observation each. A direct consequence of this limited data is that any estimated *γ* term for the corner cohorts will have a very high variance, as shown in Figure A2. Cairns *et al*. (2009) dealt with this by simply discarding the data in the triangles in Figure A1, i.e., where a cohort had four or fewer observations. Instead of the oldest cohort having year of birth *y*
_{min}−*x*
_{max}, for example, it becomes *c*
_{min}=*y*
_{min}−*x*
_{max}+4. Similarly, the youngest cohort has year of birth *c*
_{max}=*y*
_{max}−*x*
_{min}−4 instead of *y*
_{max}−*x*
_{min}.

Figure A1 Number of observations for each cohort in the data region

Figure A2 Standard errors of
$$\hat{\gamma }_{{y{\minus}x}} $$
for APCI(S) model with and without estimation of corner cohorts.
$$\hat{\gamma }_{{y{\minus}x}} $$
terms for the cohort cohorts in Figure A1 have very high standard errors; not estimating them has the additional benefit of stabilising the standard errors of those cohort terms we do estimate.

There is a drawback to the approach of Cairns *et al*. (2009), namely it makes it harder to compare model fits. We typically use an information criterion to compare models, such as the AIC or BIC. However, this is only valid where the data used are the same. If two models use different data, then their information criteria cannot be compared. This would be a problem for comparing the models in Tables 2, 3 and 12, for example, as the fit for an APC or APCI model could not be compared with the fits for the Age–Period and Lee–Carter models if corner cohorts were only dropped for some models. One approach would be to make the data the same by dropping the corner cohorts for the Age–Period and Lee–Carter fits, even though this is technically unnecessary. This sort of thing is far from ideal, however, as it involves throwing away data and would have to be applied to all sorts of other non-cohort-containing models.

An alternative approach is to use all the data, but to simply not fit cohort terms in the corners of Figure 1. This preserves the easy comparability of information criteria between different model fits. To avoid fitting cohort terms where they are too volatile we simply assume a value of *γ*=0 where there are four or fewer observations. This means that the same data are used for models with and without cohort terms, and thus that model fits can be directly compared via the BIC. Currie (2013) noted that this had the beneficial side effect of stabilising the variance of the cohort terms which are estimated, as shown in Figure A2.

For projections of *γ* we forecast not only for the unobserved cohorts, but also for the cohorts with too few observations, i.e., the cohorts in the dotted triangle in Figure A1.

**C. Identifiability Constraints**

The models in equations (5)–(8) all require identifiability constraints. For the Age–Period model we require one constraint, and we will use the following:

For the Lee–Carter model we require two constraints. For one of them we will use the same constraint as equation (17), together with the usual constraint on *β*
_{
x
} from Lee and Carter (1992):

There are numerous alternative constraint systems for the Lee–Carter model – see Girosi and King (2008), Renshaw and Haberman (2006) and Richards and Currie (2009) for examples. The choice of constraint system will affect the estimated parameter values, but will not change the fitted values of
$$\hat{\mu }_{{x,y}} $$
.

For the APC model we require three constraints. For the first one we will use the same constraint as equation (17), together with the following two from Cairns *et al*. (2009):

where *w*
_{
c
} is the number of times cohort *c* appears in the data. Continuous Mortality Investigation (2016b) uses unweighted cohort constraints, i.e., *w*
_{
c
}=1,∀*c*, but we prefer to use the constraints of Cairns *et al*. (2009), as they give less weight to years of birth with less data.

For the APCI model we require five constraints. We will use equations (17), (19) and (20), together with the following additional two:

where equation (22) is the continuation of the pattern in equations (19) and (20) established by Cairns *et al*. (2009).

The number of constraints necessary for a linear model can be determined from the rank of the model matrix. Note that the approach of not fitting *γ* terms for cohorts with four or fewer observations, as outlined in Appendix B, makes the constraints involving *γ* unnecessary for identifiability. As in Continuous Mortality Investigation (2016b), this means that the APC and APCI models in this paper are *over-constrained*, and will thus usually produce poorer fits than would be expected if a minimal constraint system were adopted. Over-constraining has a different impact on the two models: for the APC model it leads to relatively little change in *κ*, as shown in Figure 7. However, for the APCI model *κ* is little more than a noise process in the minimally constrained model (see Figure 8), while any pattern in *κ* from the over-constrained model appears likely to have been caused by the constraints on *γ*
_{
y−x
}.

**D. Fitting Penalised Constrained Linear Models**

The Age–Period, APC and APCI models in equations (5), (6) and (8) are Generalized Linear Models (GLMs) with identifiability constraints. We smooth the parameters as described in Table 1. We accomplish the parameter estimation, constraint application and smoothing simultaneously using the algorithm presented in Currie (2013). In this section, we outline the three development stages leading up to this algorithm.

Nelder and Wedderburn (1972) defined the concept of a GLM. At its core we have the *linear predictor*, *η*, defined as follows:

where *X* is the *model matrix* or *design matrix* and *θ* is the vector of parameters in the model. For the model to be identifiable we require that the rank of *X* equals the length of *θ*; the model of Cairns *et al*. (2006) is just such a mortality model (also referred to as M5 in Cairns *et al*., 2009). Nelder and Wedderburn (1972) presented an algorithm of *iteratively weighted least squares* (IWLS), the details of which vary slightly according to (i) the assumption for the distribution of deaths and (ii) the link function connecting the linear predictor to the mean of that distribution. This algorithm finds the values,
$$\hat{\theta }$$
, which jointly maximise the (log-)likelihood.

*X* can also contain basis splines, which introduces the concept of smoothing and penalisation into the GLM framework; see Eilers and Marx (1996). Currie (2013). extended the IWLS algorithm to find the values,
$$\hat{\theta }$$
, which jointly maximise the *penalised* likelihood for some given value of the smoothing parameter, *λ*. The optimum value of *λ* is determined outside the likelihood framework by minimising an information criterion, such as the BIC:

where *n* is the number of observations, Dev is the *model deviance* (McCullagh and Nelder, 1989) and ED is the *effective dimension* of the model (Hastie and Tibshirani, 1986). In the single-dimensional case, as *λ* increases so does the degree of penalisation. The penalised parameters therefore become less free to take values different from their neighbours. The result of increasing *λ* is therefore to reduce the effective dimension of the model, and so equation (24) balances goodness of fit (measured by the deviance, Dev) against the smoothness of the penalised coefficients (measured via the effective dimension, ED). Currie *et al*. (2004) and Richards *et al*. (2006) used such penalised GLMs to fit smooth, two-dimensional surfaces to mortality grids.

We note that penalisation is applied to parameters which exhibit a smooth and continuous progression, such as the *α*
_{
x
} parameters in equations (5)–(8). If a second-order penalty is applied, as
$$\lambda \to\infty$$
the smooth curve linking the parameters becomes ever more like a simple straight line, i.e., the effective dimension of *α*
_{
x
} would tend to ED=2. Alternatively, the *α*
_{
x
} could be replaced with two parameters for a simple straight-line function of age. In the case of equations (5)–(8) this would simplify the models to variants of the Gompertz model of mortality (Gompertz, 1825).

Many linear mortality models also require identifiability constraints, i.e., the rank of the model matrix is less than the number of parameters to be estimated. The Age–Period, APC and APCI models of the main body of this paper fall into this category: they are all linear, but in each case rank(*X*)<length(*θ*). The gap between rank(*X*) and length(*θ*) determines the number of identifiability constraints required. To enable simultaneous parameter estimation, smoothing and application of constraints, Currie (2013) extended the concept of the model matrix, *X*, to the *augmented model matrix*, *X*
_{aug}, defined as follows:

where *H* is the *constraint matrix* with the same number of columns as *X* and where each row of *H* corresponds to one linear constraint. If rank(*X*
_{aug})=length(*θ*), the model is identifiable. If rank(*X*
_{aug})>length(*θ*), then the model is *over-constrained*; see Appendix C. Note that the use of the augmented model matrix, *X*
_{aug}, here restricts *H* to containing linear constraints.

In this paper we use a Poisson distribution and a log link for our GLMs; this is the *canonical link function* for the Poisson distribution. This means that the fitted number of deaths is the anti-log of the linear predictor, i.e., *E*
^{
c
}×*e*
^{
η
}. However, Currie (2014) noted that a logit link often provides a better fit to population data. This would make the fitted number of deaths a logistic function of the linear predictor, i.e., *E*
^{
c
}×*e*
^{
η
}/(1+*e*
^{
η
}). If the logistic link is combined with the straight-line assumption for *α*
_{
x
} in equations (5)–(8), this would simplify the models to variants of the Perks model of mortality; see Richards (2008). Currie (2016; Appendix 1) provides R code to implement the logit link for the Poisson distribution for the number of deaths in a GLM. From experience we further suggest specifying good initial parameter estimates to R’s glm() function when using the logit link, as otherwise there can be problems due to very low exposures at advanced ages. The start option in the glm() function can be used for this. In Appendix F we use a logit link to make a M5 Perks model as an alternative to the M5 Gompertz variant using the log link. As can be seen in Table A8, the M5 Perks model fits the data markedly better than the other M5 variants.

**E. Projecting** *κ* **and** *γ*

A time series is a sequence of elements ordered by the time at which they occur; stationarity is a key concept. Informally, a time series {*Y*(*t*)} is stationary if {*Y*(*t*)} looks the same at whatever point in time we begin to observe it – see Diggle (1990, page 13). Usually we make do with the simpler second-order stationarity, which involves the mean and autocovariance of the time series. Let:

be the mean and autocovariance function of the time series. Then the time series is second-order stationary if:

that is, the covariance between *Y*(*t*) and *Y*(*s*) depends only on their separation in time; see Diggle (1990, page 58). In practice, when we say a time series is stationary we mean the series is second-order stationary. The assumption of stationarity of the first two moments only is variously known as weak-sense stationarity, wide-sense stationarity or covariance stationarity

The lag operator, *L*, operates on an element of a time series to produce the previous element. Thus, if we define a collection of time-indexed values {*κ*
_{
t
}}, then *Lκ*
_{
t
}=*κ*
_{
t−1}. Powers of *L* mean the operator is repeatedly applied, i.e., *L*
^{
i
}
*κ*
_{
t
}=*κ*
_{
t−i
}. The lag operator is also known as the backshift operator, while the difference operator, Δ, is 1−*L*.

A time series, *κ*
_{
t
}, is said to be *integrated* if the differences of order *d* are stationary, i.e., (1−*L*)^{
d
}
*κ*
_{
t
} is stationary.

A time series, *κ*
_{
t
}, is said to be *autoregressive* of order *p* if it involves a linear combination of the previous values, i.e.
$$\left( {1{\minus}\mathop{\sum}\nolimits_{i{\equals}1}^p {\rm ar}_{i} L^{i} } \right)\kappa _{t} $$
, where ar_{
i
} denotes an autoregressive parameter to be estimated. An AR process is stationary if the so-called characteristic polynomial of the process has no unit roots; see Harvey (1996). For an AR(1) process this is the case if
$${\minus}1\,\lt\,ar_{1} \,\lt\,1_{{}}^{{}} $$
. For empirically observed time series stationary can be tested using unit-root tests.

A time series, *κ*
_{
t
}, is said to be a *moving average* of order *q* if the current value can be expressed as a linear combination of the past *q* error terms, i.e.
$$\left( {1{\plus}\mathop{\sum}\nolimits_{i{\equals}1}^q {\rm ma}_{i} L^{i} } \right){\epsilon}_{t} $$
, where ma_{
i
} denotes a moving-average parameter to be estimated and {*ε*
_{
t
}} is a sequence of independent, identically distributed error terms with zero mean and common variance,
$$\sigma _{{\epsilon}}^{2} $$
. A moving-average process is always stationary.

A time series, *κ*
_{
t
}, can be modelled combining these three elements as an ARIMA model (Harvey, 1981) as follows:

An ARIMA model can be structured with or without a mean value. The latter is simply saying the mean value is set at 0. The behaviour and interpretation of this mean value is dependent on the degree of differencing, i.e., the value of *d* in ARIMA(*p*, *d*, *q*).

For the Age–Period, APC and Lee–Carter models (but not the APCI model), an ARIMA model for *κ* with *d*=1 is broadly modelling mortality improvements, i.e., *κ*
_{
t
}+_{1}−*κ*
_{
t
}. It will be appropriate where the rate of mortality improvement has been approximately constant over time, i.e., without pronounced acceleration or deceleration. An ARIMA model with *d*=1 but no mean will project gradually decelerating improvements. An ARIMA model with *d*=1 and a fitted mean will project improvements which will gradually tend to that mean value. In most applications the rate at which the long-term mean is achieved is very slow and the curvature in projected values is slight. However, there are two exceptions to this:

∙ Pure moving-average models, i.e., ARIMA(0, *d*, *q*) models. With such models the long-term mean will be achieved quickly, i.e., after *q*+*d* years.

∙ ARIMA models where the autoregressive component is weak. For example, an ARIMA(1, *d*, *q*) model where the ar_{1} parameter is closer to 0 will also converge to the long-term mean relatively quickly, with the speed of convergence inversely proportional to the absolute value of the ar_{1} parameter.

For the Age–Period, APC and Lee–Carter models (but not the APCI model), an ARIMA model for *κ* with *d*=2 is broadly modelling the rate of change in mortality improvements, not the improvements themselves. Thus, with *d*=2 we are modelling (*κ*
_{
t
}
_{+}
_{2}−*κ*
_{
t
}
_{+}
_{1})−(*κ*
_{
t
}
_{+}
_{1}−*κ*
_{
t
}). Such a model will be appropriate where the rate of mortality improvement has been accelerating or decelerating over time. An ARIMA model with *d*=2 and without a mean will project a gradual deceleration of the rate of change in mortality improvements.

To project *κ* and/or *γ* in each of the models in the paper, we fit an ARIMA model. We fit ARIMA models with a mean for *κ* in the Age–Period, APC and Lee–Carter models. We fit ARIMA models without a mean for *γ* in the APC and APCI models, and also for *κ* in the APCI model.

The ARIMA parameters, including the mean where required, are estimated using R’s arima(), which estimates ARIMA parameters assuming that *κ*
_{
y
} and *γ*
_{
y−x
} are known quantities, rather than the estimated quantities that they really are.

While R’s arima() function returns standard errors, for assessing parameter risk we use the methodology outlined in Kleinow and Richards (2016). The reason for this is that sometimes ARIMA parameter estimates can be borderline unstable, and this can lead to wider confidence intervals for the best-fitting model, as shown in Kleinow and Richards (2016).

To fit an ARIMA model we require to specify the autoregressive order (*p*), the order of differencing (*d*) and the order of the moving average (*q*). For a given level of differencing we fit an ARMA(*p*, *q*) model and choose the value of *p* and *q* by comparing an information criterion; in this paper we used Akaike’s Information Criterion (Akaike, 1987) with a small-sample correction (AICc). Choosing the order of differencing, *d*, is trickier, as the data used to fit the ARMA(*p*, *q*) model are different when *d*=1 and *d*=2: with *n* observations there are *n*−1 first differences, but only *n*−2 second differences. To decide on the ARIMA(*p*, *d*, *q*) model we select the best ARMA(*p*, *q*) model for a given value of *d* using the AICc, then we pick the ARIMA(*p*, *d*, *q*) model with the smallest root mean squared error as per Solo (1984).

The choice of differencing order is thorny: with *d*=1 we are modelling mortality improvements, but with *d*=2 we are modelling the rate of change of mortality improvements. The latter can produce very different forecasts, as evidenced by comparing the life expectancy for the APC(S) model in Table 3 (with *d*=2) with the life expectancy for the APC model in Table 2 (with *d*=1).

Table A1 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(1,1,2) process for *κ* in smoothed Age–Period model

Table A2 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(1,1,2) process for *κ* in smoothed Lee–Carter model

Table A3 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(0,1,2) process for *κ* in smoothed APC model

Table A4 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(2,1,0) process for *γ* in smoothed APC model

For a VaR assessment of in-force annuities we need to simulate sample paths for *κ*. If we want mortality rates in the upper right triangle of Figure A1, then we also need to simulate sample paths for *γ*. We use the formulae given in Kleinow & Richards (2016) for bootstrapping the mean (for *κ* only) and then use these bootstrapped parameter values for the ARIMA process to include parameter risk in the VaR assessment (Tables 5–10).

**F. Other Models**

In their presentation of a VaR framework for longevity trend risk, Richards *et al*. (2014) included some other models not considered in the main body of this paper. For interest we present comparison figures for members of the Cairns–Blake–Dowd family of stochastic projection models. We first consider a model sub-family based on Cairns *et al*. (2006) (M5) as follows (Table A6):

Table A5 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(1,1,2) process for *κ* in smoothed APCI model

Table A6 Parameters for Autoregressive, Integrated Moving Average (ARIMA)(2,1,0) process for *γ* in smoothed APCI model

Table A7 Definition of M5 Family Under Equation (31)

for some functions *g*() and *w*() where *κ*
_{0} and *κ*
_{1} form a bivariate random walk with drift. The three members of the M5 family used here are defined in Table A7, with the results in Tables A8 and A9. We also consider two further models from Cairns *et al*. (2009). First, M6:

Model M6 in equation (32) needs two identifiability constraints and we use equations (19) and (20). As with the M5 family, *κ*
_{0} and *κ*
_{1} form a bivariate random walk with drift and *γ* is projected using an ARIMA model (as done for the APC and APCI models). We also consider M7 from Cairns *et al*. (2009):

where
$$\hat{\sigma }^{2} {\equals}{1 \over {n_{x} }}\mathop{\sum}\nolimits_{i{\equals}1}^{n_{x} } (x_{i} {\minus}\bar{x})^{2} $$
. Model M7 in equation (33) needs three identifiability constraints and we use equations (19), (20) and (22). *κ*
_{0}, *κ*
_{1} and *κ*
_{2} form a trivariate random walk with drift and *γ* is projected using an ARIMA model (as done for the APC and APCI models). As with the APC and APCI models, M6 and M7 do not need all these constraints with our treatment of corner cohorts described in Appendix B. Thus, M6 and M7 here are also over-constrained.

Table A8 Expected time lived and annuity factors for unsmoothed models, together with Bayesian Information Criterion (BIC) and Effective Dimension (ED)

Table A9 Results of Value-at-Risk assessment for models in Table A8

Comparing Table A8 with Tables 2 and 3 we can see that the stochastic version of the APCI model produces similar expected time lived and temporary annuity factors to most models, apart from the APC and M6 models. This suggests that the best-estimate forecasts under the APCI model are consistent and not extreme.

Comparing Table A9 with Table 4 we can see that, while the AP(S) and APCI(S) models produce the largest VaR99.5 capital requirements at age 70, these are not extreme outliers.

A comparison of Table 4 with the equivalent figures in Richards *et al*. (2014, Table 4) shows considerable differences in VaR99.5 capital at age 70. There are two changes between Richards *et al*. (2014) and this paper that drive these differences. The first change is that Richards *et al*. (2014) discounted cashflows using a flat 3% per annum, whereas in this paper we discount cashflows using the yield curve in Figure 1. The second change lies in the data: in this paper we use UK-wide data for 1971–2015 , whereas Richards *et al*. (2014) used England and Wales data for 1961–2010. There are three important sub-sources of variation buried in this change in the data: the first is that the population estimates for 1961–1970 are not as reliable as the estimates which came after 1970; the second is that the data used in this paper include revisions to pre-2011 population estimates following the 2011 census; and the third is that mortality experience after 2010 has been unusual and is not in line with trend. The combined effect of these changes to the discount function and the data has led to the VaR99.5 capital requirements at age 70 for the models in Table A9 being around 0.5% less than for the same models in Richards *et al*. (2014, Table 4). However, a comparison between Figures 9 and A3 shows that these results are strongly dependent on age. As in Richards *et al*. (2014), this means that it is insufficient to consider a few model points for a VaR assessment – insurer capital requirements not only need to be informed by different projection models, but they must take account of the age distribution of liabilities.

Figure A3 VaR99.5 capital requirements by age for models in Table A8

**G. Differences compared to Continuous Mortality Investigation approach**

In this paper we present a stochastic implementation of the APCI model proposed by Continuous Mortality Investigation (2016b). This is the central difference between the APCI model in this paper and its original implementation in Continuous Mortality Investigation (2016a, 2016b). However, there are some other differences of note and they are listed in this section as a convenient overview.

As per Cairns *et al*. (2009) our identifiability constraints for *γ*
_{
y−x
} weight each parameter according to the number of times it appears in the data, rather than assuming equal weight as in Continuous Mortality Investigation (2016b, page 91). As with Continuous Mortality Investigation (2016b) our APC and APCI models are over-constrained (see Appendix C and section 6).

For cohorts with four or fewer observed values we do not estimate a *γ* term – see Appendix B. In contrast, Continuous Mortality Investigation (2016a, pages 27–28) adopts a more complex approach to corner cohorts, involving setting the cohort term to the nearest available estimated term.

For smoothing *α*
_{
x
} and *β*
_{
x
} we have used the penalised splines of Eilers and Marx (1996), rather than the difference penalties in Continuous Mortality Investigation (2016b). Our penalties on *α*
_{
x
} and *β*
_{
x
} are quadratic, whereas Continuous Mortality Investigation (2016b) uses cubic penalties. Unlike Continuous Mortality Investigation (2016b) we do not smooth *κ*
_{
y
} or *γ*
_{
y−x
}. We also determine the optimal level of smoothing by minimising the BIC, whereas Continuous Mortality Investigation (2016b) smooths by user judgement.

As described in Section 3, for parameter estimation we use the algorithm presented in Currie (2013). This means that constraints and smoothing are an integral part of the estimation, rather than separate steps applied in Continuous Mortality Investigation (2016b, page 15).

Unlike Continuous Mortality Investigation (2016b) we make no attempt to adjust the exposure data.

For projections we use ARIMA models for both *κ*
_{
y
} and *γ*
_{
y−x
}, rather than the deterministic targeting approach of Continuous Mortality Investigation (2016b, pages 31–35). Unlike Continuous Mortality Investigation (2016b) we do not attempt to break down mortality improvements into age, period and cohort components, nor do we have a long-term rate to target and nor do we have any concept of a “direction of travel”(Continuous Mortality Investigation, 2016b, page 14).

**H. Suggestions for Further Research**

There were many other things which could have been done in this paper, but for which there was not the time available. We list some of them here in case others are interested in doing so:

∙ Female lives. To illustrate our points, and to provide comparable figures to earlier papers such as Richards *et al*. (2014) and Kleinow and Richards (2016), we used the data for males. However, both insurers and pension schemes have material liabilities linked to female lives, and it would be interesting to explore the application of the APCI model to data on female lives.

∙ Back-testing. It would be interesting to see how the APCI model performs against other models in back-testing, i.e., fit the models to first part of the data set and see how the resulting forecasts compare to the latter part of the data.

∙ Sensitivity testing. Some models are sensitive to the range of ages selected or the period covered. It would be interesting to know how sensitive the APCI model is to such changes.

∙ Canonical correlation. Models with both period and cohort terms, such as the APC and APCI models, usually have these terms projected as if they are independent. However, such terms are usually correlated, making the assumption of independence at best a simplifying assumption for convenience. It would be interesting to compare the correlations of *κ* and *γ* for the APC and APCI models. Joint models for *κ* and *γ* could be considered.

∙ Over-dispersion. To fit the models, both we and Continuous Mortality Investigation (2017, page 5) assume that the number of deaths follows a Poisson distribution, i.e., that the variance is equal to the mean. However, in practice death counts are usually over-dispersed, i.e., the variance is greater than the mean.