Estimating lengths-of-stay of hospitalised COVID-19 patients using a non-parametric model: a case study in Galicia (Spain)

Ana López-Cheda; María-Amalia Jácome; Ricardo Cao; Pablo M. De Salazar

doi:10.1017/S0950268821000959

Estimating lengths-of-stay of hospitalised COVID-19 patients using a non-parametric model: a case study in Galicia (Spain)

Published online by Cambridge University Press: 27 April 2021

and

Ana López-Cheda: Affiliation:
Universidade da Coruña, CITIC, MODES, A Coruña, Spain
María-Amalia Jácome*: Affiliation:
Universidade da Coruña, CITIC, MODES, A Coruña, Spain
Ricardo Cao: Affiliation:
Universidade da Coruña, CITIC, ITMATI, MODES, A Coruña, Spain
Pablo M. De Salazar: Affiliation:
Department of Epidemiology, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, USA
*: Author for correspondence: María-Amalia Jácome, E-mail: maria.amalia.jacome@udc.es

Article contents

Abstract
Introduction
Materials and methods
Results
Discussion
References

Rights & Permissions

Abstract

Estimating the lengths-of-stay (LoS) of hospitalised COVID-19 patients is key for predicting the hospital beds’ demand and planning mitigation strategies, as overwhelming the healthcare systems has critical consequences for disease mortality. However, accurately mapping the time-to-event of hospital outcomes, such as the LoS in the intensive care unit (ICU), requires understanding patient trajectories while adjusting for covariates and observation bias, such as incomplete data. Standard methods, such as the Kaplan-Meier estimator, require prior assumptions that are untenable given current knowledge. Using real-time surveillance data from the first weeks of the COVID-19 epidemic in Galicia (Spain), we aimed to model the time-to-event and event probabilities of patients’ hospitalised, without parametric priors and adjusting for individual covariates. We applied a non-parametric mixture cure model and compared its performance in estimating hospital ward (HW)/ICU LoS to the performances of commonly used methods to estimate survival. We showed that the proposed model outperformed standard approaches, providing more accurate ICU and HW LoS estimates. Finally, we applied our model estimates to simulate COVID-19 hospital demand using a Monte Carlo algorithm. We provided evidence that adjusting for sex, generally overlooked in prediction models, together with age is key for accurately forecasting HW and ICU occupancy, as well as discharge or death outcomes.

Keywords

COVID-19 forecasting ICU length-of-stay mixture cure model non-parametric

Type: Original Paper
Information: Epidemiology & Infection , Volume 149 , 2021 , e102

DOI: https://doi.org/10.1017/S0950268821000959 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © The Author(s), 2021. Published by Cambridge University Press

Introduction

As of January 2021, SARS-CoV-2 transmission continues to increase in most countries worldwide [1], and in those countries where control has been achieved, resurgences are expected [Reference Kissler2] before effective vaccines are widely available. Within the main challenges of the pandemic, overwhelming the healthcare systems has critical consequences on disease mortality [Reference Grasselli, Pesenti and Cecconi3]. Thus, understanding and predicting inpatient lengths-of-stay (LoS) and critical-care demand remain some of the major components of outbreak monitoring for decision-making and contingency planning.

Predicting hospital demand entails estimating a patient's LoS and the probability of hospital outcomes such as requiring admission to the intensive care unit (ICU). Estimation of these variables is challenging as it requires investigating the patients' trajectories, and it must account for complexities in the data. For example, the LoS of some inpatients may be censored because the study ends before the patient leaves the hospital facility. The LoS of COVID-19 patients has been studied using parametric models [Reference Lewnard4], semi-parametric methods [Reference Thai5] and non-parametric estimators [Reference Grasselli, Pesenti and Cecconi3, Reference Wang6].

Parametric and semi-parametric approaches are often preferred due to their simplicity and ease of interpretation, but they require the LoS to conform to a predefined fixed model. Estimations based on non-validated assumptions can be significantly biased. Thus, non-parametric approaches, which do not require model assumptions, should be used when estimating COVID-19 LoS in the absence of solid knowledge.

The Kaplan-Meier (KM) estimator [Reference Kaplan and Meier7] is the simplest and most frequent non-parametric estimator in medical time-to-event data. It assumes that all patients with missing outcomes would experience the event in the end. This assumption applies when analysing the duration of hospitalisation, that is, the total time in the institution of the hospital (which includes time in hospital ward (HW) and time in ICU), as all patients leave the hospital eventually. However, this assumption does not apply to a patient's LoS in the HW until some of the potential outcomes, such as admission to the ICU or until death, as not all patients need admission to the ICU or die. Thus, the KM estimator should not be used to estimate those LoS, as it is wrongly specified. Alternatively, mixture cure models (MCMs) [Reference Boag8] account for the situations when a proportion of individuals will not experience the event being analysed.

Here, we propose a non-parametric MCM (NP-MCM) for estimating the LoS until specific events that are not experienced by all the patients. Specifically, we computed the following five LoS: LoS in HW until admission to ICU, LoS in HW until discharge from HW, LoS in HW until death in HW, LoS in ICU until discharge from ICU and LoS in ICU until death in ICU. Note that the KM estimator would not be biased to model the LoS until discharge if all status on discharge were gathered as a composite outcome. Although only the first LoS is necessary to model ICU demand, the other LoS are also of interest, as they are useful to estimate the conditional probability that a patient experiences each of those events according to the corresponding observed LoS. Finally, we also estimated the probability of each event.

First, to illustrate how our model improves data fitting, we compared the NP-MCM to the KM estimator (which assumes that all the individuals will experience the event) and to the empirical (E) estimator (which discards all observations which event is not observed) for a dataset of COVID-19 patients from the first weeks of the epidemic in Spain. We further simulated inpatient and critical care incidence during an outbreak, along with the final outcome (discharge or death), using the estimated values and adjusting for age and sex. Our model shows the importance of these individual variables for predicting hospital demand during transmission.

Materials and methods

Data source

The dataset contains 10 454 confirmed COVID-19 cases reported in Galicia (North-West Spain), from March 6th to May 7th 2020. Since not all of them required hospitalisation, our study only included the 2453 patients who were admitted in hospital/ICU during that period. For the patients with several HW and/or ICU admissions, the considered LoS was the first recorded one, that is, the number of days from their first entrance in HW/ICU until they experienced the outcome of interest. Data were provided by the regional public health authority, Dirección Xeral de Saúde Pública [9]. The data included information on age and sex; the dates of COVID-19 diagnosis, admission to the hospital and/or ICU and the patient's last known clinical status. A summary of the dataset can be found in Supplementary material Section S1, see also [Reference Gude10] for other results on this dataset.

Model formulation

MCMs [Reference Boag8], a special case of cure models [Reference Maller and Zhou11], explicitly model survival as a mixture of two types of patients: those who will experience the final outcome and those who will not (known as ‘cured’). Note that here a ‘cured’ individual is defined as being free of experiencing the event of interest, not necessarily cured in medical terms. The goal of MCM is to estimate the probability of experiencing the event and the distribution of the time to the event. The model is formulated as follows.

Let us denote Y as the time to the event of interest (admission to ICU, death or discharge), with survival function S(t) = P(Y > t). Let p = P(Y < ∞) be the probability that the event will happen, and S ₀(t) = P(Y > t|Y < ∞) be the survival function of the individuals experiencing the event. MCM write the survival function as S(t) = (1 − p) + pS ₀(t). Then the probability of the event, p, and the survival function of the time-to-event, S ₀(t), can be estimated using a proper estimator of the survival function, S(t), and the relations:

(1)$$p = 1-S( \infty ) \;\;{\rm and}\;\;S_0( t ) = \displaystyle{{S( t ) \;-\;( {1-p} ) } \over p}$$

When there is a group of patients known not to experience the event, the survival function S(t) can be estimated non-parametrically as follows [Reference Safari, López-de-Ullibarri and Jácome12]:

(2)$$\hat{S}( t ) = \mathop \prod \limits_{t_i\;< \;t}^{} \left({1\;-\;\displaystyle{{d_i} \over {n - i + 1 + \mathop \sum \nolimits_{\,j = 1}^{i-1} x_{\,j\;}}}} \right), $$

where t_i is the observed time-to-event of all the patients, d_i indicates if the event was observed and x_i is the indicator of whether the event was not observed because it is known it will never happen. To note, this estimator reduces to the well-known KM estimator in a classical time-to-event analysis when the event happens for all patients or when there is no way to identify the patients who will not experience the event ever.

The estimator of S(t) in (2) is computed with R software [13] and used to estimate the probability, p, of the event and the time-to-event survival function S ₀(t) using the relationships in (1) for the five LoS aforementioned in the Introduction. Details on each LoS, along with an R script for the computation of the different estimators, can be found in the Supplementary material.

The NP-MCM survival estimator of S ₀(t) is compared to the KM estimator computed with two different datasets: (i) the complete set of observations, considering all the patients who are known not experience the event as simply right censored regardless if they might experience it in the future or not (complete KM), and (ii) a reduced dataset, dismissing the patients who will not ever experience the event (reduced KM). The empirical (E) estimator, which considers only patients whose final event is observed and disregards the right censored observations, has also been considered. The NP-MCM estimator of the probability, p, of the event was computed using the estimator of S(t) in (2) and the relationships in (1). The E estimator of p, given by the ratio between the number of observed events and the total number of patients, was computed to motivate the proposed NP-MCM estimator of p (see Supplementary material Section S2 for details).

The NP-MCM estimator of S(t) in (2), the E estimator and the KM estimator do not incorporate possible covariate effects, such as sex and age. When the final outcome is experienced by all the patients, the extension of the KM estimator to handle covariates is the generalised product-limit estimator [Reference Beran14] of the conditional survival function, S(t|x). When the final outcome is not experienced by all the patients (‘cured’ individuals, all unidentified) the incorporation of covariates in the estimation of the probability of the final outcome p(x) and the distribution of the times until the event S ₀(t|x) has been studied recently [Reference Xu and Peng15–Reference López-Cheda, Jácome and Cao17] and implemented in the R package npcure [Reference López-de-Ullibarri, López-Cheda and Jácome18], which also performs significance tests for the probability of the event p(x). When the final outcome is not experienced by all the patients (‘cured’ individuals, some of them identified as it happens for our COVID-19 data), the extension of these methods for the NP-MCM model to estimate p(x) and S ₀(t|x) has been recently addressed [Reference Safari, López-de-Ullibarri and Jácome12], where evidence of the superiority of the NP-MCM over the traditional methods is shown. These conditional estimators of S(t|x) [Reference Beran14], p(x) and S ₀(t|x) [Reference Safari, López-de-Ullibarri and Jácome12, Reference Xu and Peng15–Reference López-Cheda, Jácome and Cao17] can handle continuous covariates such as age, using the information from all the individuals to provide estimates for one single value, e.g. 40 years. Ignoring the effect of age and sex on these estimates can produce important bias in the statistical analysis.

COVID-19 outbreak simulation model

We further simulated a COVID-19 outbreak based on the NP-MCM estimates of the five LoS considered, with two different models: (1) the simplest possible where the distributions of the LoS and probabilities of moving from one state (HW, ICU) to another (HW, ICU, death, discharge) do not depend on individual covariates; and (2) a more realistic one with the LoS and transition probabilities depending on the age and sex. For the ease of computation, the simulated LoS were not generated directly from the NP-MCM estimates but from the parametric distribution that best fitted the NP-MCM estimates, specifically, Weibull distributions (see Supplementary Fig. S4 for the NP-MCM estimates and their corresponding Weibull counterparts).

The simulated outbreak consisted of N = 1000 infected individuals. For the i-th infected individual i = 1, …, N we simulated the sex G_i (0 = male, 1 = female) and the age A_i (years) using the real distributions of the reported COVID-19 cases in Galicia on May 7th 2020 (Supplementary Table S1). As not all the infected individuals required hospitalisation, let H ⊂ {1, …, N} be the infected subjects admitted to the hospital. The trajectory of every hospitalised patient i ∈ H was obtained by simulating the transitions between states (HW, ICU, discharge, death) using the NP-MCM estimated probabilities. The times in each state were simulated from the Weibull distributions that best fitted the NP-MCM estimates, both conditional and unconditional on the age and sex of the patient (see Supplementary material Section S4 for further details; Supplementary Figs S4; S5 and S2, Table S3). From the evolution of all the hospitalised patients, it is straightforward to compute the number of patients in every state. We simulated 1000 outbreaks of N = 1000 infected people, so the mean number of patients in a HW, in the ICU, dead and discharged can be approximated by a Monte Carlo simulation for each day as a function of time. For supporting the goodness of fit of the conditional model that considers the age and sex of the patient, the real number of inpatients in the COVID-19 dataset has been taken as reference, rescaled to N = 1000 infected people.

Results

We first compared the estimates of the LoS using the NP-MCM estimator with the E estimator, and the KM estimator with the complete and reduced dataset.

When an event happens for all patients (a.s. ‘leave the hospital’, when all status on discharge gathered as a composite outcome), KM is not biased and coincides with the NP-MCM, both of them represented with the one single line in Figure 1 (top left) and Supplementary Figure S2 (top left). The NP-MCM and KM estimators consider the n = 2453 hospitalised patients of which 2142 experienced the event (they left the hospital within the study's timeframe). The E estimator considers only the 2142 patients who left the hospital, disregarding the information from the 311 patients still in hospital. This biases the E estimate towards shorter LoS, as hospitalised patients with longer LoS are not been included in the estimation.

Fig. 1. Estimates of the survival function of LoS using NP-MCM (thick black line), KM with the complete dataset (thin light grey line), KM with the reduced dataset (thin dark grey line) and the E estimator (red line) for all the COVID-19 hospitalised cases (n = 2453) in Galicia (Spain), when the LoS is the time of hospitalisation both in HW and ICU (top left), time in HW until admission to ICU (top right), time in HW until death in HW (middle left), time in HW until discharge (middle right), time in ICU until death in ICU (bottom left) and time in ICU until discharge from ICU (bottom right). The NP-MCM and KM estimates give the same result for the time of hospitalisation, and are represented with a single thick black line (top left).

Furthermore, when the final outcome is experienced by only a proportion of patients (‘admission to ICU’, ‘death’, ‘discharge’), the KM (with both the complete and reduced samples) overestimates the time-to-event showing longer LoS than the NP-MCM. The E estimator underestimates the time-to-event due to right censoring, showing shorter values of LoS as it only takes into account patients who experienced the event. The NP-MCM estimates do not suffer from a similar bias [Reference Safari, López-de-Ullibarri and Jácome12]. In fact, this is one of the advantages of using these methods. Interestingly, we found small differences between the NP-MCM estimates and the E estimates (Fig. 1 and Supplementary Fig. S2). The reason might be that the distribution of the LoS of the dismissed patients in the E estimation (never admitted to ICU) is similar to that of the patients who required ICU.

Importantly, for the probability of the medical event (admission from HW to ICU, death, discharge from HW or ICU) we showed that not correcting for right censoring (i.e. using the E estimator with only individuals with the observed outcome) underestimates the true probability, as the event of the right-censored individuals could be recorded later in time. The NP-MCM can adjust to right censoring, providing more accurate estimates. This can be seen when comparing individual probabilities using NP-MCM and E estimators (Table 1). Note that in Table 1, the probabilities of mutually exclusive outcomes are not equal to 1. This is because the final outcome (death or discharge) of 42 inpatients still in ICU at the end of the study remains unknown, as detailed in Supplementary material Sections S2.4 and S2.5. This inconsistency of the E estimates is partially corrected by the NP-MCM estimate.

Table 1. Estimated probabilities of the different medical events for the COVID-19 patients in Galicia (Spain) using NP-MCM and empirical estimators

HW, hospital ward; ICU, intensive care unit.

Then, we used the NP-MCM estimator to assess if age and sex could play a role in the estimates of the time of hospitalisation (both HW and ICU) and the time in ICU. Figure 2 shows that the LoS differ significantly between male and female patients, and between middle-aged (40 years) and older (70 years) patients. Particularly, we found that middle-aged female patients showed shorter LoS in both the institution of hospital and the ICU, whereas older females showed longer LoS in the ICU (but not in the hospital) compared to their male counterparts.

Fig. 2. Generalised product-limit estimator [13] of the conditional survival function S(t|x) for the time of hospitalisation, both in HW and ICU (top) and the time in ICU (bottom), incorporating the effect of the sex (male = black line, female = red line) and the ages 40 years (left) and 70 years (right) for all the COVID-19 hospitalised cases (n = 2453) in Galicia (Spain).

Finally, we implemented a COVID-19 outbreak simulation of N = 1000 infected individuals, using the NP-MCM estimates for the COVID-19 patients in Galicia (Spain) and accounting for age and sex heterogeneity in the LoS. Figure 3 shows the difference in the simulated number of inpatients between considering age and sex (conditional model) or not considering age and sex of the patient (unconditional model). The higher the curve the more inpatients the model predicts. The real number of inpatients in the COVID-19 dataset, rescaled to N = 1000 infected people, has been added as the reference. We found no large differences in the expected number of patients dead or discharged, regardless of age and sex are considered or not. Similarly, no large differences were found in the number of patients in ICU during the first month (until day 30 approximately) of the epidemic. However, after 1 month, the unconditional model tends to underestimate the number of patients in ICU, whereas the conditional estimation is closer to the real number of ICU inpatients in the COVID-19 dataset. The findings for the estimated number of patients in the HW are similar for a period of 2 months. As a consequence, if the prediction of HW and ICU beds’ demand is estimated disregarding the sex and age of the patients, those predictions will be clearly underestimated, mainly in the case of ICU capacity. The consequences of this wrong forecast of HW and ICU occupancy are shown in Figure 4. For a range of possible capacities (15–90 beds in HW, 5–15 beds in ICU), Figure 4 shows the number of days in which the predicted number of patients exceeds the capacity. As expected, there is a decreasing trend, since the lower capacity the more days with excess demand. If age and sex are disregarded for making predictions (unconditional model), Figure 4 (right) suggests that 11 ICU beds are enough to avoid overload in the ICU. However, that ICU capacity will be exceeded for 18 days, and the available ICU beds should be set to 12 instead to prevent overwhelming of the ICU. Similar conclusions can be drawn when predicting HW beds demand in Figure 4 (left). These discrepancies, simulated for N = 1000 infected people, would worsen as the incidence increases. In summary, although no large differences are observed in the predicted number of deaths and discharges, the conditional model gives more accurate estimates of the HW and ICU beds’ demand. This leads to a reduction in the number of days when the number of inpatients exceeds the HW and ICU capacity.

Fig. 3. Number of patients in HW (top left), ICU (top right), deaths (bottom left) and discharges (bottom right) computed from 1000 simulated COVID-19 outbreaks for a period of 200 days, when the LoS are simulated depending on age and sex (blue), and unconditionally ignoring age and sex dependence (red). The real case counts of inpatients in the COVID-19 dataset from March 6th (day 1) to May 7th 2020 (day 62), rescaled to N = 1000 infected people, are also included for reference (solid black line).

Fig. 4. Number of days when the demand for beds is above the capacity in HW (left) and ICU (right), for different possible capacities and computed from 1000 simulated COVID-19 outbreaks. The demand was simulated conditionally depending on age and sex (blue), and unconditionally ignoring age and sex dependence (red).

Discussion

We applied a NP-MCM to estimate the time-to-event and event probabilities, including LoS in HW and ICU and time to death or discharge. In this study, we demonstrate how the LoS of hospitalised COVID-19 patients evolve over time, given age and sex distributions matching those from our database. The proposed model outperformed the KM and the empirical estimators when the outcome is not experienced by all patients. Importantly, the model can be adjusted for the use of covariates, which is significant when conditioning for known heterogeneity in estimating LoS. Particularly, our analysis demonstrates that adjusting for age and sex is crucial in accurately understanding ICU LoS and, in turn, forecasting bed demand.

Often studies with incomplete follow-up data on patients choose to exclude these patients from the study altogether, which yields biased estimates [Reference Lapidus19]. Moreover, when forecasting hospital demand in (near) real time, information related to the most recent cases is not available, which again leads to right censored data. We showed that the empirical estimator introduced significant bias towards longer LoS from HW admission to ICU admission because it ignores patients in the HW without ICU admission. Alternately, the KM estimator yields biased estimates towards longer stays as well when the event is not experienced by all patients. The reason is that the KM estimator assumes that if the follow-up time was long enough, much longer stays would be observed. Therefore, by comparing the NP-MCM against the KM estimate, we show how biased the results are when using the KM estimate. This comparison would support the use of the non-parametric cure model approach, which diminishes the problem that occurs when not all subjects experience the event.

Our findings are consistent with previous studies: a recent systematic review has shown that median overall hospital stays ranged from 4 to 21 days outside of China [Reference Rees20], whereas our model estimated a median overall hospital stay of 11 days (IQR: 7–19); the LoS for patients who died in the HW was generally shorter than those discharged alive (median 7 and 10 days, respectively). In contrast, our estimates show a different trend with regards to ICU LoS, with similar median estimates for both death and discharged (15 vs. 14 days), again consistent with that reviewed by Rees et al. [Reference Rees20]. Of note, to our knowledge only two studies have adjusted LoS by age, all showing increased LoS for increased age, which is consistent with our findings [Reference Lewnard4, Reference Wang21]. Furthermore, as far as we know this is the first study showing the influence of sex in the LoS, which has important implications for predicting hospital demand (Fig. 2). With regards to prediction models, some approaches adjust estimates based on age [Reference Moghadas22, Reference Li23], whereas sex has generally been overlooked in hospital demand forecasting [Reference Moghadas22, Reference Wood24, Reference Grasselli25].

Noteworthily, multi-state models [Reference Andersen and Keiding26–Reference Meira-Machado, de Uña-Álvarez and Datta29] could seem an alternative method. Yet, multi-state cure models, in which transitions into one or more of the states cannot occur for a fraction of the population, are quite recent and the scarce literature is limited to semi and parametric models [Reference Colon, Taylor and Sargent30, Reference Beesley and Taylor31]. Applying a non-parametric multi-state cure model is not straightforward, since there is no available literature related to this model. As a consequence, multi-state cure models were not used in this paper, but remain as a potential alternative approach.

Finally, we highlight key limitations of our model: the lack of a parametric function limits interpretability to a great extent and complicates handling several covariates simultaneously [Reference Bellman32]. Regarding the application of MCM, there must be good evidence that some individuals in the population will never experience the event of interest and the follow-up time must be long enough [Reference Farewell33]. To date, there has not been developed a reliable method for computing uncertainty (confidence bands) using MCM estimators, which remains a limitation of our approach. Finally, data on patient comorbidities, which likely represent an important source of heterogeneity in the LoS, were not available for the analysis. Thus, more accurate estimates of the different LoS can be obtained if more complete datasets are available.

In summary, we implemented an NP-MCM that improved the standard survival methodology when estimating LoS until final outcomes that will not happen for all patients. We also found that the LoS in the ICU is sensitive to age and sex, which in turn is relevant when forecasting hospital demand in real time for public health response. We believe our proposed approach can be easily implemented in other settings and can provide more accurate estimates of COVID-19 health demand compared to previous methods.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0950268821000959.

Acknowledgements

We thank the Dirección Xeral de Saúde Pública, Xunta de Galicia for providing the database.

Author contributions

All authors contributed to the study and model design and formulation. ALC, MAJ and RC implemented the models and performed statistical analysis. All authors interpreted results and contributed to writing of the manuscript.

Financial support

ALC was sponsored by the BEATRIZ GALINDO JUNIOR Spanish from MICINN (Ministerio de Ciencia, Innovación y Universidades) with reference BGP18/00154. ALC, MAJ and RC acknowledge partial support by the MINECO (Ministerio de Economía y Competitividad) Grant MTM2014-52876-R (EU ERDF support included) and the MICINN Grant MTM2017-82724-R (EU ERDF support included) and partial support of Xunta de Galicia (Centro Singular de Investigación de Galicia accreditation ED431G 2019/01 and Grupos de Referencia Competitiva ED431C-2020-14 and ED431C2016-015) and the European Union (European Regional Development Fund - ERDF). PMD is a current recipient of the Grant of Excellence for postdoctoral studies by the Ramón Areces Foundation.

Conflict of interest

None.

Data availability statement

The data that support the findings of the study is available as aggregates in Table S1 of the Supplementary material. Line list data are available upon a request to the General Directorate of Public Health of the Autonomous Government of Galicia in Spain (Dirección Xeral de Saúde Pública, Xunta de Galicia).

References

World Health Organization (WHO). Coronavirus disease (COVID-19) weekly Epidemiological Update 109 January 2021. Available at https://www.who.int/publications/m/item/weekly-epidemiological-update---12-january-2021.Google Scholar

Kissler, SM et al. (2020) Projecting the transmission dynamics of SARS-CoV-2-through the postpandemic period. Science (New York, N.Y.) 368, 860–868.CrossRef Google Scholar PubMed

Grasselli, G, Pesenti, A and Cecconi, M (2020) Critical care utilization for the COVID-19 outbreak in Lombardy, Italy: early experience and forecast during an emergency response. Journal of the American Medical Association 323, 1545–1546.CrossRef Google Scholar PubMed

Lewnard, JA et al. (2020) Incidence, clinical outcomes, and transmission dynamics of severe coronavirus disease 2019 in California and Washington: prospective cohort study. British Medical Journal 369, m2205.Google Scholar PubMed

Thai, P et al. (2020) Factors associated with the duration of hospitalization among COVID-19 patients in Vietnam: a survival analysis. Epidemiology & Infection 148, E114.CrossRef Google Scholar PubMed

Wang, Z et al. (2020) Survival analysis of hospital length of stay of novel coronavirus (COVID-19) pneumonia patients in Sichuan, China. medRxiv; 040720057299. DOI: 10.1101/2020.04.07.20057299.CrossRef Google Scholar

Kaplan, EL and Meier, P (1958) Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 53, 457–481.CrossRef Google Scholar

Boag, JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. Journal of the Royal Statistical Society, Series B (Statistical Methodology) 11, 15–53, Available at http://www.jstor.org/stable/2983694.Google Scholar

Dirección Xeral de Saúde Pública (General Directorate of Public Health), Xunta de Galicia (Autonomous Government of Galicia, NW Spain). Available at https://www.sergas.es/Saude-publica.Google Scholar

Gude, F et al. (2021) Development and validation of a prognostic model based on comorbidities to predict COVID-19 severity. A population-based study. International Journal of Epidemiology 50, 64–74. doi: 10.1093/ije/dyaa209.CrossRef Google Scholar

Maller, RA and Zhou, S (1996) Survival Analysis with Long-Term Survivors. Chichester, U.K.: Wiley. doi: 10.1002/cbm.318.Google Scholar

Safari, WC, López-de-Ullibarri, I and Jácome, MA (2021) A product-limit estimator of the conditional survival function when cure status is partially known. Biometrical Journal, 1–22. doi: 10.1002/bimj.202000173.Google Scholar PubMed

R Core Team (2019) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at https://www.R-project.org/.Google Scholar

Beran, R (1981) Nonparametric regression with randomly censored survival data. Technical Report. University of California, Berkeley.Google Scholar

Xu, J and Peng, Y (2014) Nonparametric cure rate estimation with covariates. Canadian Journal of Statistics 42, 1–17.CrossRef Google Scholar

López-Cheda, A et al. (2017) Nonparametric incidence estimation and bootstrap bandwidth selection in mixture cure models. Computational Statistics & Data Analysis 105, 144–165.CrossRef Google Scholar

López-Cheda, A, Jácome, MA and Cao, R (2017) Nonparametric latency estimation for mixture cure models. Test 26, 353–376.CrossRef Google Scholar

López-de-Ullibarri, I, López-Cheda, A and Jácome, MA. npcure: Nonparametric estimation in mixture cure models. 2019 R package version 0.1-4. Available at https://CRAN.R-project.org/package=npcure.Google Scholar

Lapidus, N et al. (2020) Biased and unbiased estimation of the average lengths of stay in intensive care units in the COVID-19 pandemic. Annals of Intensive Care Unit 10, 135.CrossRef Google Scholar

Rees, EM et al. (2020) COVID-19 length of hospital stay: a systematic review and data synthesis. BMC Medicine 18, 270.CrossRef Google Scholar PubMed

Wang, L et al. (2020) Coronavirus disease 2019 in elderly patients: characteristics and prognostic factors based on 4-week follow-up. Journal of Infection 80, 639–645.CrossRef Google Scholar PubMed

Moghadas, SM et al. (2020) Projecting hospital utilization during the COVID-19 outbreaks in the United States. Proceedings of the National Academy of Sciences of the United States of America 117, 9122–9126.CrossRef Google Scholar PubMed

Li, R et al. (2020) Estimated demand for US hospital inpatient and intensive care unit beds for patients with COVID-19 based on comparisons with Wuhan and Guangzhou, China. Journal of the American Medical Association Network Open 3, e208297.Google Scholar PubMed

Wood, RM et al. (2020) COVID-19 scenario modelling for the mitigation of capacity-dependent deaths in intensive care. Health Care Management Science 23, 315–324.10.1007/s10729-020-09511-7CrossRef Google Scholar PubMed

Grasselli, G et al. (2020) Risk factors associated with mortality among patients with COVID-19 in intensive care units in Lombardy, Italy. Journal of the American Medical Association Internal Medicine 180, 1345–1355.Google Scholar PubMed

Andersen, PK and Keiding, N (2002) Multi-state models for event history analysis. Statistical Methods in Medical Research 11, 91–115.CrossRef Google Scholar PubMed

Meira-Machado, L, de Uña-Álvarez, J and Cadarso-Suárez, C (2009) Multi-state models for the analysis of time-to-event data. Statistical Methods in Medical Research 18, 195–222.CrossRef Google Scholar PubMed

Meira-Machado, L, de Uña-Álvarez, J and Cadarso-Suárez, C (2006) Nonparametric estimation of transition probabilities in a non-Markov illness-death model. Lifetime Data Analysis 12, 325–344.CrossRef Google Scholar

Meira-Machado, L, de Uña-Álvarez, J and Datta, S (2015) Nonparametric estimation of conditional transition probabilities in a non-Markov illness-death model. Computational Statistics 30, 377–397.CrossRef Google Scholar

Colon, ASC, Taylor, JMG and Sargent, DJ (2013) Multi-state models for colon cancer recurrence and death with a cured fraction. Statistics in Medicine 33, 1750–1766.CrossRef Google Scholar

Beesley, LJ and Taylor, JMG (2019) EM algorithms for fitting multistate cure models. Biostatistics 20, 416–432.CrossRef Google Scholar PubMed

Bellman, RE (1961) Adaptive Control Processes. Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Farewell, VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38, 1041–1046.CrossRef Google Scholar PubMed

Table 1. Estimated probabilities of the different medical events for the COVID-19 patients in Galicia (Spain) using NP-MCM and empirical estimators

López-Cheda et al. supplementary material

File 320 KB

Article contents

Estimating lengths-of-stay of hospitalised COVID-19 patients using a non-parametric model: a case study in Galicia (Spain)

Abstract

Keywords

Introduction

Materials and methods

Data source

Model formulation

COVID-19 outbreak simulation model

Results

Discussion

Supplementary material

Acknowledgements

Author contributions

Financial support

Conflict of interest

Data availability statement

References

López-Cheda et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests