Poisson approximations for conditional r-scan lengths of multiple renewal processes and application to marker arrays in biomolecular sequences

Chingfer Chen; Samuel Karlin

doi:10.1239/jap/1014842842

Poisson approximations for conditional r-scan lengths of multiple renewal processes and application to marker arrays in biomolecular sequences

Part of: Distribution theory Distribution theory - Probability

Published online by Cambridge University Press: 14 July 2016

Chingfer Chen and

Samuel Karlin

Show author details

Chingfer Chen*: Affiliation:
Stanford University
Samuel Karlin*: Affiliation:
Stanford University
*: ∗Postal address: Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA
∗∗Email address: fd.zgg@forsythe.stanford.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This study is motivated by problems of molecular sequence comparison for multiple marker arrays with correlated distributions. In this paper, the model assumes two (or more) kinds of markers, say Markers A and B, distributed along the DNA sequence. The two primary conditions of interest are (i) many of Marker B (say ≥ m) occur, and (ii) few of Marker B (say ≤ l) occur. We title these the conditional r-scan models, and inquire on the extent to which Marker A clusters or is over-dispersed in regions satisfying condition (i) or (ii). Limiting distributions for the extremal r-scan statistics from the A array satisfying conditions (i) and (ii) are derived by extending the Chen-Stein Poisson approximation method.

Keywords

r-scan statistics Chen-Stein method total variation distance total positivity

MSC classification

Primary: 60E05: Distributions

Secondary: 62E20: Asymptotic distribution theory

Type: Research Papers
Information: Journal of Applied Probability , Volume 37 , Issue 3 , September 2000 , pp. 865 - 880

DOI: https://doi.org/10.1239/jap/1014842842 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2000

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supported in part by NIH Grant 2R01HG00335-11 and 5R01GM10452-35.

References

Arratia, R., Goldstein, L., and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen–Stein method. Ann. Prob. 17, 9–25.Google Scholar

Barbour, A. D., Holst, L., and Janson, S. (1992). Poisson Approximation. Oxford Scientific Publications.CrossRef Google Scholar

Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Prob. 3, 534–545.Google Scholar

Dembo, A., and Karlin, S. (1992). Poisson approximations for r-scan processes. Ann. Appl. Prob. 2, 329–357.Google Scholar

Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.Google Scholar

Gerstein, M. (1997). A structure census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J. Molec. Biol. 274, 562–576.Google Scholar

Karlin, S. (1968). Total Positivity. Stanford University Press.Google Scholar

Karlin, S., and Brendel, V. (1992). Chance and statistical significance in protein and DNA sequence analysis. Science 257, 39–49.Google Scholar

Karlin, S., and Cardon, L. R. (1994). Computational DNA sequence analysis. Ann. Rev. Microbiol. 48, 619–654.Google Scholar

Karlin, S., and Macken, C. (1991). Some statistical problems in the assessment of inhomogeneities of DNA sequence data. J. Amer. Statist. Assoc. 86, 26–33.Google Scholar

Karlin, S., Mrázek, J., and Campbell, A. (1996). Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res. 24, 4263–4272.Google Scholar

Karlin, S., and Taylor, H. M. (1975). A First Course in Stochastic Processes. 2nd edn. Academic Press, New York.Google Scholar

Naus, J. I. (1979). An indexed bibliography of clusters clumps and coincidences. Int. Statist. Rev. 47, 47–78.Google Scholar

Naus, J. I. (1982). Approximation of distributions of scan statistics. J. Amer. Statist. Assoc. 77, 177–183.Google Scholar

Reinert, G., and Schbath, S. (1998). Compound Poisson and Poisson approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223–53.Google Scholar

Waterman, M. S. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, New York.Google Scholar

Article contents

Poisson approximations for conditional r-scan lengths of multiple renewal processes and application to marker arrays in biomolecular sequences

Abstract

Keywords

MSC classification

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests