1. Introduction
In nonlife insurance, premiums are set to cover the expected future cost and also allow for the earmarked underwriting profit through a process known as ratemaking. The pure premium method and the loss ratio are two popular traditional methods used for ratemaking. These methods ensure that the total premium will cover the total costs while allowing for the targeted underwriting profit. To obtain rates, historical claims are developed to ultimate using macrolevel reserving techniques and adjusted to a level applicable to the future effective period using trending techniques. See Werner and Modlin (Reference Werner and Modlin2010) and Brown and Lennox (Reference Brown and Scott Lennox2015) for details on these methods.
Accurate risk pricing is expected to provide stronger incentives for more caution, resulting in lower claim frequencies and reductions in insurance loss costs (Cummins, Reference Cummins and David Cummins2002). As a result, in addition to determining the overall average rate change from traditional methods, actuaries employ univariate risk classification techniques to identify a base rate and risk classification variables. The base rate is assigned relativity of one, and different risk classification variables are assigned other relativities. Thus, the rates are individual rates, reflecting the individual policyholder’s characteristics.
The main shortcoming of the univariate risk classification techniques is that they do not consider all rating variables simultaneously. In contrast, multivariate risk classification techniques allow for the consideration of all rating variables simultaneously and automatically adjust for exposure correlations between rating variables. Hence, by employing multivariate risk classification techniques, actuaries develop rates to align premiums with expected costs better. The generalized linear models (GLMs) and machine learning algorithms are two popular multivariate risk classification techniques (Taylor and McGuire, Reference Taylor and McGuire2016; Wüthrich and Merz, Reference Wüthrich and Merz2023).
An observation from the ratemaking literature is that the multivariate analysis data are often based on closed claims, where the ultimate amount paid for all claims is known. This observation is not surprising because practitioners want to use the most recent data but are limited by the requirement that the data be complete, thus leading to a natural friction between using only closed claims from older policy years and using the information on all reported claims, including information on open claims. For example, last year’s data are most representative of current trends but are certainly incomplete (censored).
With closed claims, all uncertainties in information on open claims are eliminated. However, using only closed claims may not reflect shifts in the expected claim payments’ distribution. Practicing actuaries are well aware of these biases and have developed ad hoc onleveling methods to adjust the data to the current level. For example, premium trending techniques are used to adjust the historical premium to the level expected during the future time period, and the actuary needs to determine how to measure and incorporate the shifts that have occurred. In contrast, the information on open claims can reflect shifts in the expected claim payments’ distribution automatically. Hence, to better align premiums with expected future costs, actuaries need to extract all the information they can from open claims, especially in these pandemic times.
This paper presents an intuitive framework for ratemaking that ensures that the multivariate risk analysis is done more routinely using the information on claims that have been closed and payments on open claims. To model the complete development of claims for ratemaking purposes, we employ the marked Poisson process (MPP) framework with three hierarchical building blocks. The first building block concerns the number of claims per policy by modeling the number of claims reported and accounts for the future cost relating to Incurred But Not Reported (IBNR) claims through a unique feature of the MPP framework by analyzing the reporting delay distribution of claims. The second and third building blocks model the conditional number of payment transactions for a claim and the conditional payment sizes for each transaction, respectively. For Reported But Not Settled (RBNS) claims, the number of transactions is censored at the ratemaking date, which is duly addressed. One advantage of the MPP is that the likelihood of the claims process can be decomposed into independent blocks, which allows each block to be maximized in isolation (Larsen, Reference Larsen2007). As a result, the parameters of each block are estimated with the appropriate GLMs. We use policy covariates that are readily available for new and existing policyholders for an observation period.
The MPP framework, which was introduced by Arjas (Reference Arjas1989), Jewell (Reference Jewell1989), Norberg (Reference Norberg1993), and (Reference Norberg1999), has been widely used for individuallevel loss reserving. For example, Antonio and Plat (Reference Antonio and Plat2014) and Verrall and Wüthrich (Reference Verrall and Wüthrich2016) apply the marked Poisson process for nonlife insurance loss reserving where claims occurrences are assumed to follow a nonhomogeneous Poisson process, and the stochastic characteristics about the claims are treated as marks. The MPP framework’s hierarchical makeup also provides flexibility in modeling different events and their features in the ratemaking process. In insurance pricing, hierarchical models are not new. For example, the frequencyseverity model, a popular multivariate approach to model the claim frequency and payments arising from closed claims, forms a twolevel hierarchical pricing model. See Frees (Reference Frees, Frees, Derrig and Meyers2014) for details and the application of the frequencyseverity models. Frees and Valdez (Reference Frees and Valdez2008) and Frees et al. (Reference Frees, Shi and Valdez2009) extended the frequencyseverity model to a hierarchical model with three building blocks relating to the frequency, type, and severity of claims. Shi et al. (Reference Shi, Feng and Boucher2016) provide a hierarchical framework for modeling insurance loss cost with a complex structure and propose a copula regression to accommodate various sources of dependence, and Karoui et al. (Reference Karoui, Loisel and Salhi2017) discuss a robust optimization approach to promptly identify shifts in the frequency of insurance claims by detecting any disruptions in the intensity of the counting process, even when changes occur at unpredictable and unobservable times.
We contribute to the literature in two main ways: First, the proposed framework ensures that the multivariate risk analysis is done using the information on both open and closed claims in a more efficient way leading to better alignment of premiums. Second, by automatically accounting for the expected cost relating to open claims without using a separate claim reserving model, the proposed framework bridges the gap between ratemaking and reserving and makes the ratemaking process complete and balanced for individual risks.
The rest of the paper is organized as follows. Section 2 presents the MPP model and its application to ratemaking. Section 3 discusses parameter estimation and the loss cost prediction for the MPP model. Section 4 evaluates the ratemaking performance of the MPP model using simulation studies. Section 5 provides the model fitting results using a training dataset and evaluates the MPP model’s quality of prediction using outofsample data. Section 6 concludes.
2. Claim modeling
2.1. Data structure
Let $j=1,\ldots, J$ represent the index for policies in the portfolio, and $t=1,\ldots, T_j$ represent the policy years observed for each policy, then the observable responses at the ratemaking date are:

• $N_{jt}$ , the number of claims reported within a policy year t for each policy j.

• $M_{jt,i}$ , the number of transactions for each claim, where $i=1,\ldots, N_{jt}$ is the claim index. For open claims, $M_{jt,i}$ is censored; then denote $\delta_{jt,i}=1$ when claim is closed or $\delta_{jt,i}=0$ otherwise.

• $P_{jt,ik}$ , the payment amount per transaction. Where the payment transaction index is $k=1,\ldots, M_{jt,i}$ .
The exposure $e_{jt}$ is measured as a fraction of years, which provides the length of time in the policy year as of the ratemaking date. For the explanatory covariates, policylevel characteristics represented by $\boldsymbol{x}_{jt}$ are used. Additionally, $U_{jt,i}$ is the reporting delay variable which is the difference between claim occurrence and reporting times for claim i reported in the $\{jt\}$ observation period. Then, the data available can be summarized as:
For the ratemaking exercise in this paper, we assume $\{N_{jt}, M_{jt,i}\}$ are recorded for each policy year, but the analysis done in this paper can be easily extended to other periods. Further, insurers usually record the reporting and claim occurrence dates, and we will assume the reporting delay variable $U_{jt,i}$ is in days.
2.2. Marked Poisson process
Figure 1 elaborates on the timeline for claim occurrence at times $V_1 = v_1, V_2 = v_2, \ldots, V_n = v_n$ and transaction occurrence at times $S_1 = s_1, S_2 = s_2, \ldots, S_m = s_m$ in a fixed period $[0, \tau]$ . In this paper, $\tau$ represents the ratemaking date. From the figure, it is clear there are two counting processes, one relating to claims occurrence and the other the transaction occurrence after reporting.
The associated counting process $\{N(v), 0 \leq v\}$ of the claim occurrence process in Figure 1 is Poisson and records the cumulative number of claims that the process generates. We denote $H(v)=\{N(u)\,:\,0 \leq u < v\}$ to be the history of the claims occurrence process at time v. Then, the intensity function, determined only by v, for the claim occurrence process, is given by:
where $\rho(v)$ is a nonnegative integrable function and $\Delta N(v)$ represents the number of claims in the short interval $[v,v+\Delta v)$ . Further, observable covariates $\boldsymbol{x}_j(v)$ for policyholders $j=1, \ldots J$ that affects claim occurrence may be incorporated in the model by including the covariate information in the process history. Thus, the heterogeneities among the policyholders can be accounted for by specifying the intensity function of the form:
where $\boldsymbol{x}_j^{v}=\{\boldsymbol{x}_j(u)\,:\,0\leq u\leq v\}$ is the covariate history. $\rho_0(v)$ is the baseline function that relates to policyholders for whom $\boldsymbol{x}_j(v)=0$ for all v, and $\boldsymbol{\beta}$ is a vector of regression coefficients for the covariates.
The marked Poisson process (MPP) framework is employed for the claims modeling for insurance pricing. For a marked Poisson process in $[0, \tau]$ , the likelihood that n claims occur at times $V_1 = v_1, V_2 = v_2, \ldots, V_n = v_n$ , with marks $Z_1 = z_1, Z_2 = z_2, \ldots, Z_n = z_n$ is given by:
Here, the claim occurrence counting process N(v) is a Poisson process with intensity function $\rho(v)$ . The distribution of the marks $P_{\boldsymbol{Z}v}$ is conditional on $\Delta N(v)=1$ .
The MPP framework allows for the modeling of the entire claim process, including occurrence, reporting, and development after reporting. Let $\boldsymbol{Z}_i = (U_i,\boldsymbol{W}_i)$ , with $U_i$ and $\boldsymbol{W}_i$ denoting the reporting delay and the claim development process after reporting, respectively. As seen in Figure 1, $\boldsymbol{W}_i$ includes payment transaction occurrence times $S_{ik}$ and the severity of each transaction $P_{ik}$ , where $k=1,\ldots, m_i$ index payment transactions for the ith claim. Then, the distribution of the marks $P_{\boldsymbol{Z}v}$ is specified as $P_{\boldsymbol{Z}v} = P_{Uv} \times P_{\boldsymbol{W}v,u}$ . This paper assumes that the claim occurrence process and the marks, such as the reporting delay and transaction payments, are independent. It is also assumed the marks are independent of each other.
The reporting delay distribution U given occurrence time v, $P_{Uv}=P_{U}$ , can be modeled using various distributions from survival analysis, but we specify a mixed distribution comprising of a discrete distribution for a reporting delay below or equal to r days, and a Weibull distribution for reporting delays above r days with density function $f_U$ . The likelihood for the reporting delay is given by:
where the probability mass for a reporting delay of r days is given by $q_r$ . Specifically, we use $d=0$ , that is, a probability mass for a reporting delay of zero days (reporting in the same day of occurrence, $q_0$ ). To incorporate the policyholder characteristics $\boldsymbol{x}_{j}$ that may impact the reporting delay distribution of claim i, we specify a Weibull distribution with the scale parameter $\theta_{i}$ that depends on the policyholder characteristics and a constant shape parameter $\kappa$ given by:
The other component of $P_{\boldsymbol{Z}v}$ is the distribution of the claim development process after reporting, $P_{\boldsymbol{W}v,u}$ . We assume the occurrence of transactions for claim i also follow a nonhomogeneous Poisson process in $[0, \tau]$ , and transaction payment amounts are treated as marks. Then, the likelihood that $m_i$ transactions occur at times $S_{i1} = s_{i1}, S_{i2} = s_{i2}, \ldots, S_{im_i} = s_{im_i}$ , with marks $P_{i1} = p_{i1}, P_{i2} = p_{i2}, \ldots, P_{im_i} = p_{im_i}$ is given by:
Here, the transaction occurrence counting process $M_i(s)$ is a Poisson process with intensity function $\lambda_i(s)$ , k applies to all payments in $[0,\tau]$ , and the distribution of the transaction payment amounts is conditional on $\Delta M_i(s)=1$ . The density function for the payment severity is denoted by $f_{\boldsymbol{P}}(p_{ik})$ .
3. Statistical inference
3.1. Estimating parameters
At the ratemaking time $\tau$ , there are reported claims whose full or partial development process is observed, that is, $\boldsymbol{\mathcal{C}}^{rep} = ((v, u, w) \in \boldsymbol{\mathcal{C}}v + u \leq \tau)$ , and IBNR claims whose development process is totally unobserved, that is, $\boldsymbol{\mathcal{C}}^{ibnr} = ((v, u, w) \in \boldsymbol{\mathcal{C}}v\leq \tau, v + u > \tau )$ . Then, the occurrence of reported claims follows an independent Poisson process with intensity function $\rho(v)F_{Uv}(\tauv)$ and that of IBNR claims also follows an independent Poisson process with intensity function $\rho(v)(1F_{Uv}(\tauv))$ (Wüthrich and Merz, Reference Wüthrich and Merz2008).
It follows that the observed likelihood of the claims process is given by:
where $f({\cdot})$ and $F({\cdot})$ denote a pdf and a cdf, respectively. The superscript in the claim development term (last term in (3.1)) represents that a claim that occurred at $v_i$ and with reporting delay $u_i$ is censored at $\tau v_i u_i$ time units after reporting. As discussed earlier, given the occurrence time v and the reporting delay u, the claim development process $\boldsymbol{W}$ can be decomposed into the payment transactions occurrence times $\boldsymbol{S}$ and the transaction payment amounts $\boldsymbol{P}$ . Then, the observed likelihood in (3.1) becomes:
Here, k applies to all payments in $[0,\tau_i]$ , where $\tau_i=\min\!(\tauv_iu_i, S_i)$ and $S_i$ is the total waiting time from reporting to settlement of claim i. We emphasize that in addition to the claim occurrence counting process N(v) with intensity function $\rho(v)$ , the transaction occurrence counting process $M_i(s)$ is also a Poisson process with intensity function $\lambda_i(s)$ .
Considering the ratemaking data are organized by the $\{jt\}$ observation period, where $j=1,\ldots, J$ index policyholders, and $t=1,\ldots,T_j$ index the policy years for each claim; let the intensity function for the counting process of claim occurrence $N_{jt}(v)$ be $\rho_{jt}(v)$ and that of counting process of transaction occurrence $M_{jt,i}(s)$ be $\lambda_{jt,i}(s)$ . Then, the likelihood for the observed claims process in (3.2) becomes:
where $\tau\in[T_j1,T_j]$ , the exposure at time v is given by $w_{jt}(v)$ , $n_{jt}$ denote the number of reported claims that occur in the $\{jt\}$ observation period, and $m_{jt,i}$ is the number of transactions for claim i reported in the $\{jt\}$ observation period. With regard to the ratemaking application, the number of transactions to settlement is of interest. But for RBNS claims, the number of transactions is censored at the ratemaking date $\tau$ . Therefore, we denote, $\delta_{jt,i}=I(S_{jt,i}\leq\tau_{jt,i})$ to indicate whether the claim has been closed by the ratemaking time. Note that, $S_{jt,i}$ is the total waiting time from reporting to settlement of claim i reported in the $\{jt\}$ observation period. Thus, $\delta_{jt,i}=1$ for closed claims, and $\delta_{jt,i}=0$ for RBNS claims at $\tau$ . Additionally, given that $S_{jt,i}>\tau_{jt,i}$ for RBNS claims, it means that $M_{jt,i}(S_{jt,i})\geq M_{jt,i}(\tau_{jt,i})=m_{jt,i}$ .
The likelihood in (3.3) for reported claims can be broken down into three building blocks: the number of claims per policy in a policy year, the conditional number of payment transactions for a claim, and the conditional payment sizes for each transaction. The likelihood is decomposed into independent blocks, which can be maximized in isolation. But, the MPP is a continuoustime model, and the data on the claims occurrence and transaction occurrence recorded and available for statistical inference are discrete. Thus, we assume a piecewise constant specification for the intensity functions that allow the use of the recorded number of claims and the number of transactions per claim for estimation. Each block is discussed below.
3.1.1. Poisson process for claim frequency
The first line in the likelihood in (3.3) relates to the occurrence of reported claims. A multiplicative form of the intensity function is assumed where $\rho_{jt}(v)=\rho_0(v;\,\boldsymbol{\alpha})\exp\!(\boldsymbol{x}^{\prime}_{jt}\boldsymbol{\beta})$ . Here, $\boldsymbol{x}_{jt}$ are the rating variables, and $\{\boldsymbol{\alpha}, \boldsymbol{\beta}\}$ are parameters to be estimated. To estimate $\rho_{jt}(v)$ , we assume the claim occurrence follows a Poisson process with a nonhomogeneous piecewise constant intensity $\rho_{jt}$ , such that the baseline rate function is given by:
where $t=1,\ldots T$ , and T is the most recent policy effective year. The parameters of the baseline rate function are denoted by $\boldsymbol{\alpha}=(\alpha_1,\ldots,\alpha_T)$ , and $a_0<a_1<\cdots,a_T$ are the cutpoints of the intervals for the baseline function where $a_0=0$ and $a_T=T$ . Then, the occurrence of reported claims follows an independent Poisson process with intensity function $\rho_{jt}F_{Uv}(\tauv)$ . The corresponding likelihood for the occurrence of reported claims is given by:
where $\rho_{jt}=\alpha_t\exp\!(\boldsymbol{x}^{\prime}_{jt}\boldsymbol{\beta})$ , and $e_{jt}=\int_{t1}^{t}w_{jt}(u)du$ is the exposure time in ${(a_{t1},a_t]}$ for policyholder j. For ratemaking purposes, the likelihood in (3.5) can be maximized via a Poisson regression with a log link where $\rho_{jt}= \exp\!(\!\ln \alpha_t+ \boldsymbol{x}^{\prime}_{jt}\boldsymbol{\beta})$ is the mean of the response $N_{jt}$ . Here, $\ln\!(e_{jt})$ and $\ln\!\left(\int_{t1}^{t}F_{Uv}(\tauv)dv\right)$ are specified as offset variables to account for the exposure as at the ratemaking date and to adjust parameters to account for IBNR claims, respectively. An estimate of the reporting delay distribution is obtained by fitting the distribution in (2.5). Hence, the estimation for the claim frequency is a twostage approach. First, the reporting delay model is estimated, and second, the estimated parameters from the reporting delaying model are plugged in to estimate claim frequency model.
3.1.2. Poisson process for transaction frequency
The second line in the likelihood in (3.3) relates to transaction occurrence conditional on having at least a claim. The transaction counting process $M_{jt,i}(s)$ is also Poisson with intensity measure $\lambda_{jt,i}(s)=\lambda_0(s;\,\boldsymbol{b})\exp\!(\boldsymbol{x}^{\prime}_{jt}\boldsymbol{\pi})$ . Again, a piecewise constant intensity $\lambda_{jt,i}$ is assumed such that the baseline rate function is given by:
where $\boldsymbol{b}=(b_1,\ldots,b_T)$ are parameters of the baseline rate function. Here, the likelihood for the transaction occurrence can be specified as proportional to the product of Poisson likelihoods shown as:
For RBNS claims, $M_{jt,i}(S_{jt,i})\geq m_{jt,i}$ , then the likelihood in (3.7) can be maximized using censored Poisson regression with a log link where $\lambda_{jt,i}=\exp\!(\!\ln b_t + \boldsymbol{x}^{\prime}_{jt}\boldsymbol{\pi})$ is the mean of the response $M_{jt,i}$ . The likelihood for the censored Poisson is given by:
where $f({\cdot})$ is a Poisson density function and $\delta_{jt,i}=1$ if the claims are closed or $\delta_{jt,i}=0$ if open.
3.1.3. Transaction severity modeling
The conditional severity block describes the claim payment size per transaction. Different distributions can be used to model the transaction payments $P_{jt,ik}$ with conditional mean $\mu_{jt,ik}=\exp\!(\boldsymbol{x}^{\prime}_{jt}\boldsymbol{\phi})$ . The gamma GLM is frequently used in insurance pricing to model payment sizes (Henckaerts et al., Reference Henckaerts, Antonio, Clijsters and Verbelen2018), but the goodnessoffit test needs to be performed to select the proper distribution for the data. Also, certain characteristics of the data may inform model choices. For example, Antonio and Plat (Reference Antonio and Plat2014) built different models for the first transaction payments and the later transaction payments.
3.2. Loss cost prediction using the MPP Model
A rating formula based on the MPP ratemaking framework will be achieved by the product of exponentiated estimates from the claim frequency, transaction frequency, and severity models. The following rating formula calculates the predicted loss cost:
where $e_{j}$ is the exposure variable and $\boldsymbol{x}_{j}$ are rating factors for the new contract. $\{\hat{\alpha}_T,\hat{b}_T\}$ are the fitted trend parameters from the most recent policy year for the reported claim and transaction frequency models. Also, $\{\hat{\boldsymbol{\beta}},\hat{\boldsymbol{\pi}},\hat{\boldsymbol{\phi}}\}$ are the fitted parameters for rating variables from the claim frequency model, transaction frequency model, and the severity model building blocks.
In this approach, including information on open claims, such as the number of reported open claims, the claim transaction frequency on open claims, and the payments on open claims, automatically accounts for the reserves on reported claims because the regression parameters are accurately adjusted. Also, the claim frequency regression parameters are adjusted for IBNR claims to account for IBNR reserves. Further, trends in time are captured by the trend parameter estimates $\{\hat{\alpha}_T,\hat{b}_T\}$ and will help to adjust the experience to a cost level applicable to a future effective period. In addition, the trends in time, other rating variables, and open claims data would help promptly capture environmental changes. Finally, regulatory requirements are an important challenge in implementing advanced ratemaking models. Especially, the need to explain and quantify the effect of each policyholder characteristic on the total premium. By multiplying the fitted parameters found in the three layers, the MPP helps to satisfy this requirement.
We note that the predicted loss cost in (3.9) is based on reported claims on which the insurer has made payments. However, if the number of payment transactions, $M_{{jt,i}}$ , in the transaction frequency model is allowed to take on the value of zero, where $M_{{jt,i}}$ can be equal to zero when claims are reported, but the insurer has not made any payment transactions at the ratemaking date; then, the expected payment per transaction is given by $\hat{\mu}_{jt,ik}=q_{{j}}\times \exp\!(\boldsymbol{x}^{\prime}_{jt}\hat{\boldsymbol{\phi}})$ , where $q_{{j}}$ is the probability of positive payment transactions from the transactionlevel data.
4. Ratemaking performance evaluation using simulated data
This section highlights the importance of open claims in the ratemaking process using simulated data from the MPP model. We show that using only closed claims for ratemaking leads to biased estimates and inaccurate premiums. We also underscore the advantages of the MPP model over the frequencyseverity model that incorporates information on open claims.
4.1. Simulation design
In this simulation, for simplicity, the number of claims for policyholder j within a policy year t is assumed to be a homogeneous Poisson regression with the conditional mean specified as:
where $\boldsymbol{x}_{jt}=\{x_{j1},x_{j2}\}$ are policylevel covariates. We assume $x_1\sim Bernoulli(0.3)$ , representing a discrete rating variable and $x_2\sim Normal(0,1)$ , corresponding to a continuous rating variable. The reporting delay distribution for claims is assumed to be from a Weibull distribution with a constant shape parameter $\kappa$ and scale parameter $\theta_i$ given by:
For this simulation, we use $\kappa=0.2$ and ${\gamma}=\{1.5, 0.3, 0.1\}$ . In addition, claim occurence times, $v_i$ , are assumed to follow Uniform(0, 5). In this specific setting, we set $\tau=5$ . Given that a claim is reported, $M_{jt,i}$ , the number of transactions to the settlement of claim i reported in the observation period $\{jt\}$ is also assumed to be a homogeneous Poisson regression with the conditional mean specified as:
Further, given that there are $k=1,\ldots, M_{jt,i}$ transactions, the payments are assumed to be from a gamma regression with logarithmic link function and dispersion parameter $1/\sigma$ . The conditional mean is specified as:
Finally, we let $\delta_{jt,i}\sim Bernoulli(\eta)$ , that is, whether a claim is closed or censored is simulated using a Bernoulli distribution with probability $\eta$ . We vary $\eta$ to examine the effect of the proportion of closed claims in the portfolio on the estimation and prediction results. The parameters used for data generation under the claim frequency, transaction frequency, and transaction severity models are shown in Table 1. We employ Algorithm 1 in the Online Appendix to construct the simulation data.
4.2. Parameter estimates
The estimation results using the simulated data at the ratemaking date are given in Table 1 and based on $S=100$ replications. Following the likelihoodbased method described in Section 3.1, we obtain the parameter estimates and the associated standard error of each simulated sample based on data at the ratemaking date. We present results for different proportions of closed claims and different numbers of policies. Specifically, we report estimations results when the proportion of closed claims are $30\%, 80\%,$ and 100% and when the number of policies are 500, 1000, and 1500.
In Table 1, we present the average bias (Bias), the nominal standard deviation of the point estimates (SD), and the average standard error (SE) of the estimates. It can be seen that both the average bias and uncertainty of the average bias for parameters in the claim frequency, transaction frequency, and transaction severity models decrease as the number of policies increases. Further, for the transaction frequency model and transaction severity model, the average bias and uncertainty of the average bias decrease as the proportion of closed claims in the portfolio increases. This observation shows that ignoring open claims will only lead to more biased parameter estimates. The results also show that the average standard error is comparable to the nominal standard deviation, indicating the accuracy of variance estimates.
4.3. Loss cost prediction
This section focuses on the proposed MPP model’s prediction performance under different proportions of closed claims and number of policies in the portfolio. The results reported are based on $S = 100$ replication. The loss cost prediction for each policy is obtained using (3.9).
We compare the actual loss cost from the simulated data to that of the loss cost predictions from the MPP model and that of a frequencyseverity model which uses only closed claims to estimate parameters named FS_Closed. For the frequencyseverity model, the Poisson model is used to model the claim frequency, and a gamma GLM with a logarithmic link is used to model the loss amounts from claims (Frees, Reference Frees, Frees, Derrig and Meyers2014). See the Online Appendix for the rating formula based on the frequencyseverity model. Further, we compare the loss cost prediction from the MPP model to a frequencyseverity model that incorporates information on all reported claims named FS_All. For the open claims in the FS_All model, to eliminate the uncertainty in reserve predictions, we use the actual ultimate amount in the model of loss amounts. This implies that the loss cost predictions from the FS_All are the bestcase scenario from the frequencyseverity model.
The Gini index measure, which was motivated by the economics of insurance and developed in Frees et al. (Reference Frees, Meyers and Cummings2011), is employed to aid in the comparison of loss cost predictions between the different models and the actual loss cost. The Gini index is a measure of profit and thus insurers that adopt a rating structure with a larger Gini index are more likely to enjoy a profitable portfolio. The Gini index can be interpreted as the covariance between the profit (loss minus premium) and the rank of relativities (score divided by premium). In addition, the Gini index can be described as twice the area between the ordered Lorenz curve and the 45degree line (line of equality), where the ordered Lorenz curve is a graph of the ordered premium versus ordered loss distribution based on the relativity measure. The Gini index may range over [ $1$ ,1]. The results in Table 2 assume that the insurer has adopted the FS_Closed as a base premium for rating purposes, and the insurer wishes to investigate alternative scoring methods (in this case, the MPP model) to understand the potential vulnerabilities of this base premium. Similarly, the results in Table 3 assume that the insurer has adopted the FS_All as a base premium.
Table 2 summarizes several comparisons using the Gini index when the number of policies are 500, 1000, and 1500 with different proportions of closed claims. We report the average Gini index and the average standard error (SE) of the Gini index estimates. The standard errors were derived in Frees et al. (Reference Frees, Meyers and Cummings2011) where the asymptotic normality of the Gini index was proved. From the results, with 500 policies and a 30% proportion of closed claims, the Gini index is $11.50\%$ , which indicates that insurers using FS_Closed for premiums could look to the MPP model to detect discrepancies between their loss and premium distributions. The standard error implies that the discrepancy is statistically significant. Similar observations are made when the policy numbers increase to 1000 and 1500 with just 30% proportion of closed claims. However, the discrepancies may not be significant when the proportion of closed claims increased to 80% and 100%, meaning the FrequencySeverity model based on closed claims and the MPP model produce similar predictions only when the proportion of closed claims is high.
Further, the Gini index results from Table 3, where the FS_All model is the base premium, show that the MPP model produces loss costs that are slightly better than that from the bestcase scenario of the frequencyseverity framework. But, the standard error implies that the discrepancies may not be statistically significant. Even with similar loss cost predictions from the MPP and the FS_All models, the MPP framework provides the advantage of performing the ratemaking exercise without needing a reserving model to develop claims to ultimate.
To show that the MPP can account for claim reserves, Table 4 presents the actual versus expected analysis of the total loss cost values (A/E). The A/E is a common actuarial model projection tool and provides the actual total loss cost as a percentage of predicted loss cost from the models. We report the average A/E scores and the nominal standard deviation of the A/E scores (SD). For accurate model projections, we expect the average A/E to be close to 100%. The results show that the MPP performs better than the bestcase scenario of the frequencyseverity model (FS_All) because the FS_All model does not account for IBNR claims. Hence, the FS_All model will only be competitive when the claim occurrence frequency model is adjusted to account for IBNR claims. Also, the frequencyseverity models based on only closed claims (FS_Closed) performed worse because, in addition to not accounting for IBNR claims, it does not use the information of reported claims that are open. This further shows that the MPP framework provides the advantage of performing the ratemaking exercise without the need for a reserving model to develop claims to ultimate and also accounts for IBNR claims.
5. Empirical analysis using the MPP
5.1. Data
The data we use for the ratemaking exercise in this paper are from the Wisconsin Local Government Property Insurance Fund (LGPIF). The Fund insures local government entities, including counties, cities, towns, villages, school districts, and library boards. The primary coverage of the Fund, which is the focus of this paper, is the building and contents coverage. But the Fund also provides coverage for inland marine (construction equipment) and motor vehicles. The dataset has already been used in other ratemaking papers; for example, see Frees and Lee (Reference Frees and Lee2015).
Though the LGPIF data spans from January 1, 2006, to December 31, 2013, we focus on the dataset from effective years 20062011, where all claims are marked as closed as of December 31, 2013. Here, we use data from the policy, claim, and transaction databases. Table 5 shows the summary statistics, at the policy and claim level, from effective years 20062011. High variability across years in the average claim frequencies and severity is observed at the policy level, highlighting the importance of using current information in claim modeling for ratemaking purposes. Further, from the summary statistics at the claim level, the average number of payments transactions to settlement per claim is gradually reducing, and the average payment per transaction is increasing. This observation suggests a change in claims processing of the LGPIF. Such environmental changes affect the distribution of future losses, and using current information on open claims allows capturing such changes promptly.
Table 6 describes the rating variables considered in this paper. Tables 1 and 2 in the Online Appendix show that the rating variables are correlated with the claim frequency and severity at the policy level, and with the transaction severity at the claim level, which indicates that they will be significant predictors of claims in the ratemaking model. For the ratemaking exercise, the data from effective years 20062009 are used as the training sample to calibrate the MPP model. Here, we assume that by December 31, 2009, policyholders’ rates have to be updated for the policy year 2010. The rating factors from the calibrated MPP model are then applied to the 2010 rating variables to predict 2010 loss costs (claims scores).
The left panel of Figure 2 presents the relationship between the total amount of payment and the number of claims per policy based on the training sample. We note that the policies are all oneyear policies. The plot suggests an increase in the number of reported claims in a policy year increases the total amounts paid by the insurer. A similar positive relationship is observed in the right panel between the ultimate amount paid for claims and the number of transactions to settlement. The small circles identify outliers. Table 1 in the Online Appendix shows that the transaction payments are heavily rightskewed. The left panel of Figure 3 shows the histogram of the transaction payment amounts based on the training sample. Here, the log scale is used to handle the skewness. The reporting delay is a key driver of IBNR claims, and the right panel of Figure 3 shows the distribution of the reporting delays in months. The distribution appears to be rightskewed.
Table 7 provides a summary of the number of closed, RBNS, and IBNR claims as of December 31, 2009. As expected, the very recent effective year 2009 is associated with the highest RBNS and IBNR claims. Therefore, ratemaking models that rely on only closed claims will lose current information needed to produce accurate premiums. The table also summarizes the average payments for reported claims (closed and RBNS claims) as of December 31, 2009. It is seen that the average payments across years for RBNS claims are more significant than closed claims, and RBNS claims tend to decrease as the effective year increases indicating that bigger claims take a longer time to settle. This is because, as discussed in Okine et al. (Reference Okine, Frees and Shi2022), large and complicated claims naturally involve multiple interested parties, demand special expertise, and are more likely to be litigated. Hence, the payments on the RBNS claims can reflect the change in the Fund’s risk profile and capture environmental changes in a timely manner compared to closed claims.
For robustness checks, we also present an analysis of the MPP framework using the effective years 2007–2010 as the training sample. We compare the loss cost predictions from this new model to the outofsample data from 2011.
5.2. Estimation results
This section presents the estimation results from the three building blocks in the MPP framework fitted using maximum likelihood. The training data contain observations from effective years 2006–2009, but we also show the parameter estimates using only data from the recent effective year 2009 to examine the effect on estimation when the proportion of closed claims in the portfolio is reduced.
Figure 4 explores different distributions to fit the transaction payment amounts. The figure plots the empirical distribution functions of the transaction payment amounts to the distribution functions from the fitted Gamma, Pareto, and Lognormal distributions. The Pareto and the Lognormal distributions appear to provide a reasonable fit to the data. But, the Akaike Information Criterion (AIC) for the Lognormal fit is 69,749.6, and that of the Pareto fit is 69,758.0. Hence, we select the Lognormal distribution for modeling the transactions payment block.
Figure 5 compares the reporting delay distribution with the fitted mixed distribution with a probability mass for a reporting delay of zero and a Weibull distribution for reporting delays above zero. The fitted mixed distribution seems to fit the observed reporting delay data reasonably well from the plots. The reporting delay distribution may vary depending on the policyholder characteristics; therefore, we expand the Weibull distribution for reporting delay to include the entity types. Table A.1 in the Appendix shows the parameter estimates of the fitted Weibull model in (2.6).
Table A.2. in the Appendix presents the estimation results for the MPP model using the information on reported claims at the ratemaking date and following the likelihoodbased method described in Section 3.1. The rating variables used are described in Table 6. The results from the claim frequency model capture the trend in reported claims across years when using all the training data. For the loss cost prediction, the baseline parameter from the most recent year $\alpha_{2009}$ is used. Further, the censored Poisson transaction frequency model and the Lognormal severity model results are provided in the middle and bottom layer of Table A.2. The effect of a rating variable can be obtained by adding up the effects from all three building blocks. For example, using all the training data, the parameter estimate for LnPolicyCov is positive in the claim frequency model (1.107), negative in the transaction frequency model ( $$ 0.022), and negative in the transaction payment model ( $$ 0.349). In this case, the overall effect can be interpreted as positive (0.737). In addition, the last two columns in Table A.2 provide results using all the training data with policy coverage (LnPolicyCov) and time (policy years) as interaction variables. The results suggest differences in the effect of the policy coverage values on claim frequency in different years.
5.3. Outofsample performance
This section provides the loss cost prediction based on the MPP model fitted in Section 5.2. Thus, we do not consider expenses associated with the loss amounts. The predictions were generated based on the 2010 outofsample rating variables, and we compare the predictions to the 2010 outofsample claims. Also, loss cost prediction results are provided for the MPP model with policy coverage and time as interaction variables, named MPP_Int. In addition, we compare the outofsample results to that of a frequencyseverity model that uses only closed claims in the estimation of parameters named FS_Closed and another that uses all reported claims in the model building, named FS_All. For the open claims in the FS_All model, we use the incurred payment (amount paid plus loss reserve) as an estimate for the ultimate payment amount. Further, we compare the 2010 outofsample premiums in the dataset, which an external agency generated, to the loss cost predictions from the models. These external agency premiums are the actual premiums paid by the policyholder for the 2010 effective year, and results based on these premiums are named EA_Premiums.
Figure 6 shows the scatterplot of the ranks for the 2010 outofsample claims and the predicted loss costs from the models, where each point relates to a policyholder. It also shows a scatterplot of the ranks between the outofsample claims and the external outofsample premiums. The vertical line formed by the points relates to policyholders without losses in the 2010 effective year, and rank ties are replaced by their mean. The correlation of the rank of the scores from the MPP and the 2010 outofsample claims is $48.89\%$ , and that of the scores from the external 2010 outofsample premiums is $45.36\%$ . The higher correlation from the MPP model shows that the MPP framework offers a promising alternative to the existing rating methodology.
The Gini index measure is employed to aid in the comparison of loss cost predictions between the different models, the external outofsample premiums, and the outofsample claims. As discussed in Section 4.3, insurers that adopt a rating structure with a larger Gini index are more likely to enjoy a profitable portfolio. We show that the MPP framework helps align premiums with the underlying risk better than the frequencyseverity approach and is highly competitive against the existing rating methodology used to generate the external outofsample premiums. As seen from Table 5, for the outofsample year 2010, the average severity is higher than in other years due to two unusually large claims as a result of the Great Milwaukee Flood. These two large claims were paid to two different policyholders, and the total amounts paid to the policyholders are $\$12,922,218$ and $\$4,776,206$ . Hence, we provide the Gini index results with and without the policies with the unusual claims in the 2010 outofsample data.
Table 8 presents the Gini index results, where we use FS_All as a base premium. The columns summarize results using the FS_Closed, MPP scoring methods in addition to the external premium (EA_Premiums). The table shows that when the ratemaking model was built using data from the effective year 2009, the frequencyseverity model based on only closed claims did not fare well compared with the base model. Also, the MPP model discrepancies were not statistically significant compared to the base model, and the external agency premiums fared well compared to the base premium (with or without the unusual claims). However, when the data from the effective years 20062009 are used in the model building and the unusual claims are used in the 2010 outofsample data, the Gini index for the MPP model and the external agency premiums are $49.24\%$ and $43.70\%$ , respectively. Also, the MPP model with interaction variables has a Gini index of $58.70\%$ . The standard error implies that the differences are statistically significant. The results suggest that the MPP framework promotes equity in pricing compared to the base premium (FS_All) and indicates loss cost predictions from the MPP model will lead to more equitable rates. Similar results are seen when the unusual claims are removed from the 2010 outofsample data. It is worth noting that, when all the training data are used in the model building, with the proportion of closed claims increased, the difference between the base model FS_All and the FS_Closed is not statistically significant. The ordered Lorenz plots relating to the Gini indexes with FS_ALL as a base premium are shown in Figure 7. The figure demonstrates that the area between the Lorenz curve and the 45degree line generally decreased when unusual claims were removed, visually explaining the changes in the Gini index.
Table 9 further investigates the FS_All model loss cost projection using reserve predictions instead of the incurred payment in the dataset. Here, the reserves for reported but not settled claims are estimated using two individuallevel reserving methods, the joint model and the Marked Poisson Process. The former jointly model the longitudinal model for cumulative payments and the survival model for timetosettlement data, incorporating the paymentsettlement association in the reserving process. The entire claim process is modeled in the latter, including the claim occurrence reporting and development after reporting. Both methods were applied in Okine et al. (Reference Okine, Frees and Shi2022) on the LGPIF dataset for reserving purposes. FS_All_JM and FS_All_MPP are the names for the FrequencySeverity model that uses the joint model and Marked Poisson Process models to obtain the ultimate amount of reported but not settled claims, respectively. The models are based on data from effective years 20062009. Not surprisingly, the results are similar to that reported in Table 8.
With the external agency premiums being the actual premiums policyholder paid, these premiums provide an interesting stateoftheart baseline for model comparison. Hence, we provide the Gini index results with the external agency premiums as a base premium in Table 10 and the corresponding ordered Lorenz plots in Figure 8, and the results show that the MPP framework offers a promising alternative to the existing rating methodology.
Tables 11 and 12 provide a robustness check for the results in Tables 8 and 10. Here, we use data from the effective years 20072010 as the training dataset and observations from the effective year 2011 as the holdoutsample. Again, the performance of the MPP models is higher than the base models, which emphasizes that the MPP model promotes equity in rates and will lead to a more profitable portfolio.
5.4. Limitations of current framework
Based on simulations and realworld data analysis via the Marked Poisson Process, this paper has shown that extracting information from open claims leads to more accurate insurance pricing. However, the proposed framework has some limitations which will be the focus of future research and discussed below:

1. Independence assumption between claim occurrence and marks and between marks: In this paper, for simplicity, it is assumed that the claim occurrence process and the marks, such as the reporting delay process and transaction payments, are independent. We also assume the marks are independent of each other. But, we do not need to require independence. We could allow dependence by incorporating unobservable policyspecific random effects or by using copula modeling. For example, Shi et al. (Reference Shi, Feng and Boucher2016, Reference Shi, Fung and Dickinson2022) and Frees et al. (Reference Frees, Bolancé, Guillen and Valdez2021) accommodate the dependence of the multilevel structure of claims using copula modeling, and Okine et al. (Reference Okine, Frees and Shi2022) use random effects to capture the association between the size of claims and time to settlement. However, we note that Frees et al. (Reference Frees, Lee and Yang2016) verify that dependence modeling has little influence on the claim scores under the frequencyseverity model approach. The analysis of incorporating dependence modeling under the MPP framework is left to explore in the future.

2. Modeling claim and transaction frequencies with distributions other than Poisson: The Marked Poisson Framework employed in this paper means that we focus on using the Poisson distribution to model the claim and transaction frequencies. The Poisson process possesses attractive mathematical properties for insurance applications and thus is the most popular claim number process. However, insurance claim counts often experience overdispersion. For this reason, other distributions, such as the negative binomial distribution or zeroinflated versions of distributions, are used to model the claim frequencies. One approach to modeling overdispersion is using random effects in the intensity function to capture process dependencies and unobservable characteristics (Cook and Lawless Reference Cook and Lawless2007). Hence, future work will focus on incorporating overdispersions in the proposed framework.
6. Conclusion
Through the ratemaking process, insurance rates are set to cover the total future expected cost, which includes liabilities from both RBNS and IBNR claims. Actuaries develop rates by employing multivariate risk classification techniques based on information from the policy and the claim history to promote better alignment of premiums with claims experience. But the observation from the literature is that the data used in the multivariate analysis are usually based on closed claims, where the ultimate amount paid for all claims is known, leaving out open claims. Ignoring the information from open claims could lead to inaccurate rates because the ratemaking data lacks the current information that may capture shifts in the insurer’s book risk profile. Practicing actuaries are well aware of these biases and have developed cumbersome onleveling methods to adjust the data to the current level and develop claims to ultimate during the ratemaking exercise.
This paper employs the marked Poisson process (MPP) framework, which has primarily been used for microlevel reserving, to provide formal procedures to make adjustments and improve on insurance pricing. The MPP framework specified ensures that the multivariate risk analysis is done using the information on claims that have been closed by the ratemaking date, payments on claims not yet closed, and reporting times to adjust rates to account for IBNR claims. The MPP framework specified for ratemaking purposes uses three hierarchical building blocks. The first building block drives the number of claims per policy, the second building block models the conditional number of payment transactions for a claim, and the third building block concerns the conditional payment sizes for each transaction. Each block is modeled with the appropriate GLMs. The results using data from a property insurance provider show that the proposed framework promotes equity in the ratemaking algorithm and leads to a more profitable portfolio.
In these COVID19 pandemic times and in addition to the insurance industry experiencing a rapid pace of product innovation and intense competition, by using the information on all reported claims, the proposed approach will provide actuaries and regulators a more disciplined method of ascertaining promptly if rate increases are necessary.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/asb.2023.23.
Appendix
Appendix A: Estimation results for the MPP model
Table A.2. presents the estimation results for the MPP model based on reported claims at the ratemaking date and following the likelihoodbased method described in Section 3.1 of the paper. The upper layer of the table presents the estimation results for the claim frequency Poisson model. The baseline trend parameters and the rating variables with parameter estimates that are not significant are not shown. When using all the training data (without interaction), as expected, the coefficient for LnPolicyDed is negative, meaning higher deductible is associated with lower claim frequency, but the coefficient switches to positive and significant when using only observations from the effective year 2009. Also, the coefficients for the LnPolicyCov are positive and significant in both results. Compared to the reference category “Village,” all entity types experience lower claims frequency except “Town” in both results. In addition, based on the Region rating factor, there are significant differences in claim frequency driven by the geographical location. Further, when using all the training data, the difference in the baseline parameters $\alpha_t$ compared to $\alpha_{2006}$ captures the trend in reported claims across years. The last two columns in Table A.2 provide results using all the training data with LnPolicyCov and policy years as interaction variables. The results suggest differences in the effect of the policy coverage values on claim frequency in different years. The estimation results for the Lognormal severity model with dispersion parameter $\sigma^2$ are given in third layer of Table A.2. The dependent variable is the observed transaction payments $P_{jt,ik,}$ and the results show a significant difference in claim transaction payments based on geographical location. When using all the training data, there are also differences in the transaction payments by entity types.