A statistical model investigating the prevalence of tuberculosis in New York City using counting processes with two change-points

J. A. ACHCAR; E. Z. MARTINEZ; A. RUFFINO-NETTO; C. D. PAULINO; P. SOARES

doi:10.1017/S0950268808000526

A statistical model investigating the prevalence of tuberculosis in New York City using counting processes with two change-points

Published online by Cambridge University Press: 17 March 2008

C. D. PAULINO and

J. A. ACHCAR: Affiliation:
Departamento de Medicina Social, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Brazil
E. Z. MARTINEZ*: Affiliation:
Departamento de Medicina Social, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Brazil
A. RUFFINO-NETTO: Affiliation:
Departamento de Medicina Social, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Brazil
C. D. PAULINO: Affiliation:
Departamento de Matemática, Universidade Técnica de Lisboa, Portugal
P. SOARES: Affiliation:
Departamento de Matemática, Universidade Técnica de Lisboa, Portugal
*: *Author for correspondence: Dr E. Z. Martinez, Departamento de Medicina Social, Faculdade de Medicina de Ribeirão Preto, USP, Avenida Bandeirantes 3900, 14049-900 Ribeirão Preto, SP, Brazil. (Email: edson@fmrp.usp.br)

Article contents

Summary
INTRODUCTION
METHODS
RESULTS
DISCUSSION
APPENDIX
References

Rights & Permissions

Summary

We considered a Bayesian analysis for the prevalence of tuberculosis cases in New York City from 1970 to 2000. This counting dataset presented two change-points during this period. We modelled this counting dataset considering non-homogeneous Poisson processes in the presence of the two-change points. A Bayesian analysis for the data is considered using Markov chain Monte Carlo methods. Simulated Gibbs samples for the parameters of interest were obtained using WinBugs software.

Type: Original Papers
Information: Epidemiology & Infection , Volume 136 , Issue 12 , December 2008 , pp. 1599 - 1605

DOI: https://doi.org/10.1017/S0950268808000526 [Opens in a new window]
Copyright: Copyright © 2008 Cambridge University Press

INTRODUCTION

General overview

In 1993 the World Health Organization (WHO) declared tuberculosis (TB) a global public health emergency, being the only disease thus far to warrant that designation. Although hospitals have been established and chemotherapy has been developed to combat TB, bringing considerable reduction in incidence to developed nations, historical data calculated by the WHO indicate that there have not been great effects on the global problem since the time of Koch. Currently, TB is responsible for more human deaths than any other single infectious agent, representing 26% of all preventable deaths and 7% of all deaths [Reference Pio and Chaulet1].

TB resumption has been attributed to several factors, such as the increase in drug resistance, the HIV/AIDS pandemic (at the beginning of the 1980s), the increase of injecting drug users, changes in social structure, the increase of immigrants from high prevalence nations to developed ones, the ageing of the world's population, the active transmission in environments of human accumulation (e.g. prisons, hospitals, homeless shelters), and the dismantling of health-care systems [Reference Ducati2]. Although TB became a re-emerging disease in European and North-American nations, TB is not an emergent nor re-emerging public health problem in developing countries such as Brazil, but rather a long lasting one [Reference Ruffino-Netto3].

In order to facilitate the comprehension of the various components involved in the interaction between these factors, Ruffino-Netto [Reference Ruffino-Netto4] proposed an expression which reflects the TB burden, represented by the components social inequality, prevalence of HIV-positive individuals, percentual default of treatment, prevalence of primary resistance plus acquired resistance, migration, age of the population, adequate health services, directly observed treatment short-course, educational level, nutrition level, human resources for TB control and degree of political participation of the population.

From the historical data of the observed numbers of TB cases, especially in developed countries, we observe a trend of declining incidence starting at the beginning of 1960 up to 1980, where there was a change in this trend. During the period between 1980 and 1990, we observe an increase in the incidence rates of TB cases; after 1990, we again observe a trend of declining incidence. That is, we have the presence of two change-points in the rates of TB, especially for developed countries.

The case of New York City

The incidence (notification cases) of TB disease in New York City (NYC) between 1970 and 2000 presents three trends (see Table 1): a first period (1970–1979) where the trend of declining incidence was probably associated with good control programmes; a second period (1979–1992) where there was an increase in incidence rates [Reference Coker5, Reference Wallace6], possibly associated with a systematic dismantling of public-health infrastructure of control programmes, social disruption (including homelessness, drug abuse, poverty and housing overcrowding), and mainly caused by the HIV epidemic; a third period (1992–2000) where again there is a decline in incidence rates. It is important to remember the many factors associated with this third period, i.e. implementation of directly observed therapy, broader chemotherapy regimens for patients with TB or suspected multidrug-resistant TB and improved therapeutics for the care of HIV-infected individuals [Reference Paolo and Nosanchuk7].

Table 1. Number of tuberculosis cases in NYC from 1970 to 2000

Source: New York City Department of Health and Mental Hygiene, Bureau of Tuberculosis Control Information Survey.

Table 1 gives the yearly numbers and the accumulated numbers of TB cases in NYC. From Table 1, we see decreasing numbers of TB cases from 1970 to 1978, where there is a minimum. From 1978 to 1992, we observe increasing numbers of TB cases, where there is a maximum number of cases in 1992. From 1992 to 2000, we observe decreasing numbers of cases. That is, we have two change-points for the numbers of cases (see Fig. 1). It is interesting to note that the use of powerful antiviral drugs against HIV commenced around 1990.

Fig. 1. Number of tuberculosis cases in New York City, 1970–2000.

To model the number of TB cases in NYC during the period 1970–2000, we consider the use of a point process to count the numbers of TB cases in each year starting in 1970. In this way, we considered a stratified sample of size n=6721 representing 10% of the total number of TB cases (259 cases in 1970, 257 cases in 1971, 227 cases in 1972 and so on), where for each year, we used an uniform distribution to have the times (in days) for the occurrence of each case since 1 January 1970 until 30 December 2000, i.e. with a total time of observation equal to T=11 323 days.

For this dataset, we assume a non-homogeneous Poisson process (NHPP) in the presence of two change-points considering a Bayesian approach using Markov chain Monte Carlo (MCMC) methods (see e.g. Gelfand & Smith [Reference Gelfand and Smith8]). The use of Bayesian methods has been considered by many authors for analyses of homogeneous or non-homogeneous Poisson processes in the presence of change-points (see e.g. Raftery & Akman [Reference Raftery and Akman9] considering the presence of a change-point in homogeneous Poisson processes, or Ruggeri & Sivaganesan [Reference Ruggeri and Sivaganesan10] considering any number, random or fixed, of change-points in NHPP assuming power-law intensity functions).

The paper is organized as follows: in the Methods section, we introduce the likelihood function and a Bayesian analysis for the model; in the Results section, we introduce the analysis for the NYC data, and finally, in the Discussion, we present some concluding remarks.

METHODS

The likelihood function

Let N(t) be the cumulative number of TB cases that are observed during the interval (0, t) and assume that N(t) is modelled by a NHPP with intensity function λ(t)=dm(t)/dt=dE[N(t)]/dt, where m(t) is the mean value function (see e.g. Cox & Lewis [Reference Cox and Lewis11]). Different parametrical forms could be assumed for the intensity function λ(t) (increasing, decreasing, bathtub shape, unimodal, among many others, see e.g. Musa & Okumoto [Reference Musa and Okumoto12] or Muldholkar et al. [Reference Mudholkar, Srivastava and Friemer13]). We assume power-law processes (PLP) in the presence of two change-points with intensity function for the overall process given by

(1)

where θ=(α₁, α₂, α₃, β₁, β₂, β₃, ζ₁, ζ₂).

Equivalently, letting m _j(t)=m(t|θj), the corresponding mean value function is given by

(2)

where m ₁(t)=(t/α₁)^β₁, m ₂(t)=(t/α₂)^β₂ and m ₃(t)=(t/α₃)^β₃.

Observe that the intensity function λ_j(t) in equation (1) is constant for β_j=1, decreases for β_j<1 and increases for β_j>1, j=1, 2, 3. This process is related to the Weibull probability model [Reference Kuo and Yang14] (α_j, β_j), j=1, 2, 3.

Assuming that the data are observed up to a total time T, where the epochs of occurrence of cases are denoted by t _i, i=1, … , n, 0<t ₁ <t ₂< … <…<t _n<T, the likelihood function for θ in the presence of two change-points ζ₁ and ζ₂ is given by

(3)

where λ_j(t) is given in equation (1) and m _j(t) is given in equation (2) for j=1, 2, 3. To justify the likelihood function [equation (3)], observe that N(s+t) – N(s) given θ has a Poisson distribution P(m(s+t|θ) – m(s|θ)) for t>0 and independent increments [Reference Raftery and Akman9]. Thus, the sampling distribution for the between occurrence times, say U _i, has density , , and so on. In this way, we obtain the likelihood of the data D _T={n; t ₁, …, , …, t _n, T} in the presence of two change-points. Moreover, observe that homogeneous Poisson processes in the presence of one change-point is a special case of equation (3) [Reference Raftery and Akman9].

A Bayesian analysis

For a Bayesian analysis of the PLP with intensity function given in equation (1) in the presence of two change-points ζ₁ and ζ₂, we assume uniform prior distributions for α_j and β_j given by

(4)

for j=1, 2, 3, where a _j, are known hyperparameters, b ₁₁ and b ₁₃ are assumed to be equal to 0 and b ₂₁ and b ₂₃ are assumed to be equal to 1 in order to have decreasing intensity functions in the intervals 0<t<ζ₁ and ζ₂<t<T; b ₁₂ is assumed to be equal to 1 to have increasing intensity function in the interval ζ₁<t<ζ₂, and a _j and b ₂₂ are assumed to have large values (non-informative prior distributions for α_j, j=1, 2, 3). We also assume uniform prior distributions for the change-points ζ₁ and ζ₂, given by

(5)

where c _ℓ and d _ℓ are known hyperparameters, ℓ=1, 2 is assumed to have ζ₁<ζ₂. We further consider prior independence among the parameters.

The joint posterior distribution for θ is given by [Reference Box and Tiao15]

(6)

where D _T={n; t ₁, …, t _n; T}, 0<α_j<a _j, 0<β_j<b _j, c _ℓ<ζ_l<d _ℓ, j=1, 2, 3 and ℓ=1, 2.

To simulate samples for the joint posterior distribution [equation (6)], we could consider standard MCMC methods such as the Gibbs sampling algorithm [Reference Gelfand and Smith8] or the Metropolis–Hastings algorithm [Reference Smith and Roberts16]. In this case, we need all full conditional posterior distributions Π(θ_j|θ_(j), D _T), j=1, 2, …, K and θ_(j)=(θ₁, …, θ_j−1, θ_j+1, …, θ_K). A great computational simplification is given by WinBugs software [Reference Spiegelhalter, Thomas and Best17], where we only need to specify the joint distribution for the data and the prior distributions for the parameters.

RESULTS

For a Bayesian analysis of the NYC TB data, we assumed the uniform prior distributions [equation (4)] for α_j and β_j with a _j=100, j=1, 2, 3, b ₂₁=b ₂₃=1, b ₁₁=b ₁₃=0 (related to decreasing functions), b ₁₂=1 and b ₂₂=10 (related to an increasing function between the first and second change-points). We also assumed prior distributions [equation (6)] for the change-points ζ₁ and ζ₂ with c ₁=2558, d ₁=4383, c ₂=7671 and d ₂=9131 (number of days since 1 January 1970). This choice of prior distributions, especially for the change-points ζ₁ and ζ₂ are based on medical knowledge of the epidemic, or if it is known that the first change-point is between 1977 and 1982 and the second change point is between 1991 and 1995. We are assuming in the prior distribution [equation (5)] that the two intervals do not overlap. Using WinBugs software and considering a burn-in sample of size 40 000, we simulated a Gibbs sample of size 100 000 choosing every 50th sample for each parameter to have approximately uncorrelated samples, i.e. obtaining a final Gibbs sample of size 1200 to get the posterior summaries for each parameter. The WinBugs code is given in the Appendix.

Table 2 gives the posterior summaries for each parameter. Convergence of the Gibbs sampling algorithm was monitored by checking the plots of the simulated samples for each parameter to verify if a stationary distribution was obtained by the 1200 simulated Gibbs samples and also using other existing methods to check the convergence of the Gibbs sampling algorithm [Reference Gelman and Rubin18].

Table 2. Posterior summaries for the parameters

Figure 2 shows the plots of the marginal posterior distributions for the change-points ζ₁ and ζ₂ approximated by the simulated Gibbs samples. Inferences for the change-points are of great interest to epidemiologists.

Fig. 2. Marginal posterior distributions for the change-points. (a) ζ₁; (b) ζ₂.

Considering the Monte Carlo estimators for the posterior means of α₁, α₂, α₃, β₁, β₂, β₃, ζ₁ and ζ₂ given in Table 2, we obtain Bayesian estimators for the mean value function m(t) given by equation (2), i.e.

Table 3 gives Monte Carlo Bayesian estimators for m(t) based on the 1200 simulated Gibbs samples and the observed accumulated numbers of TB cases for each year. Figure 3 shows the plot of the estimated mean vale function and the observed accumulated number of TB cases against the years (in days). We observe a good fit for the PLP in the presence of two change-points.

Fig. 3. Mean value function. —, Estimated; ○, observed.

Table 3. Estimators for the mean value function and observed accumulated numbers

DISCUSSION

There was a great increase in TB prevalence in NYC during the 1980s and at the start of the 1990s, with a peak of 3811 cases in 1992. In 1978 a very low number of TB cases in NYC (1307 cases) was observed, following a long period of decreasing numbers in the prevalence of the disease. We observe that the proposed model was well fitted to the data of TB cases in NYC, and it is straightforward to implement this model in WinBugs.

The presence of more than one change-point is common in many applications of medical counting data. Considering the NYC TB data of Table 1, the results obtained in the present study could be easily extended for other epidemiological datasets, where we could have the presence of a finite number of change-points. In this way, the likelihood function [equation (6)] could be easily generalized to accommodate more than two change-points. We usually have great difficulty in obtaining classical inference results for the parameters of NHPP in the presence of change-points and the use of MCMC methods is a suitable way of obtaining Bayesian inferences for this family of models. Using WinBugs software greatly simplifies obtaining the posterior summaries of interest.

Other parametrical forms for the intensity functions [equation (1)] could be considered in place of PLP. In this case we could consider other usual intensity functions commonly used in software reliability studies, e.g. Gompertz growth, logistic growth, etc. [Reference Musa, Iannino and Okumoto19].

APPENDIX

The WinBUGS code used to fit the Bayesian model is given below: In this WinBUGS code, a ₁, a ₂, a ₃ are the hyperparameters of the uniform prior distribution of α₁, α₂ and α₃, respectively; b ₁, b ₂, b ₃ are the hyperparameters of the uniform prior distribution of β₁, β₂ and β₃, respectively [see equation (4)]; and c ₁, d ₁, c ₂ and d ₂ are the hyperparameters in equation (5). These hyperparameters are declared in the last line of the WinBugs code.

DECLARATION OF INTEREST

None.

References

REFERENCES

Pio, A, Chaulet, P. Tuberculosis Handbook (WHO/TB/98.253). Geneva: World Health Organization, 1998.Google Scholar

Ducati, RG, et al. The resumption of consumption – a review on tuberculosis. Memórias do Instituto Oswaldo Cruz 2006; 101: 697–714.CrossRef Google Scholar PubMed

Ruffino-Netto, A. Tuberculosis: the neglected calamity. Revista da Sociedade Brasileira de Medicina Tropical 2002; 35: 51–58.CrossRef Google Scholar

Ruffino-Netto, A. Tuberculosis load: reflections on a theme. Jornal Brasileiro de Pneumologia 2004; 30: 307–309.Google Scholar

Coker, R. Lessons from New York's tuberculosis epidemic. British Medical Journal 1998; 317: 616.CrossRef Google Scholar PubMed

Wallace, DN. Discriminatory public policies in the New York City tuberculosis epidemic, 1975–1993. Microbes and Infection 2001; 3: 515–524.CrossRef Google Scholar PubMed

Paolo, WF Jr., Nosanchuk, JD. Tuberculosis in New York city: recent lessons and a look ahead. Lancet Infection Diseases 2004; 4: 287–293.CrossRef Google Scholar

Gelfand, AE, Smith, AFM. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 1990; 85: 398–409.CrossRef Google Scholar

Raftery, AE, Akman, VE. Bayesian analysis of a Poisson process with a change-point. Biometrika 1986; 73: 85–89.CrossRef Google Scholar

Ruggeri, F, Sivaganesan, S. On modeling change points in non-homogeneous Poisson processes. Statistical Inference for Stochastic Processes 2005; 8: 311–329.CrossRef Google Scholar

Cox, DR, Lewis, PA. Statistical Analysis of Series of Events. London: Methuen, 1966.Google Scholar

Musa, JD, Okumoto, K. A logarithm Poisson execution time model for software reliability measurement. Proceedings of Seventh International Conference on Software Engineering. Orlando, 1984, pp.230–238.Google Scholar

Mudholkar, GS, Srivastava, DK, Friemer, M. The exponentiated Weibull family: a reanalysis of the bus-motor failure data. Technometrics 1995; 37: 436–445.CrossRef Google Scholar

Kuo, L, Yang, TY. Bayesian computation for nonhomogeneous Poisson process in software reliability. Journal of the American Statistical Association 1996; 91: 763–773.CrossRef Google Scholar

Box, GEP, Tiao, GC. Bayesian Inference in Statistical Analysis. New York: Addison-Wesley, 1973.Google Scholar

Smith, AFM, Roberts, GO. Bayesian computation via the Gibbs sampler and related Markov Chain Monte Carlo methods. Journal of the Royal Statistical Society B 1993:55: 3–23.Google Scholar

Spiegelhalter, D, Thomas, A, Best, N. WinBUGS Version 1.3 User Manual. Cambridge: Medical Research Council Biostatistics Unit, 2000.Google Scholar

Gelman, A, Rubin, DB. Inference from iterative simulation using multiple sequences (with discussion). Statistical Science 1992; 7: 457–511.CrossRef Google Scholar