Hostname: page-component-78c5997874-v9fdk Total loading time: 0 Render date: 2024-11-18T10:34:09.489Z Has data issue: false hasContentIssue false

Poisson approximations for conditional r-scan lengths of multiple renewal processes and application to marker arrays in biomolecular sequences

Published online by Cambridge University Press:  14 July 2016

Chingfer Chen*
Affiliation:
Stanford University
Samuel Karlin*
Affiliation:
Stanford University
*
Postal address: Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA
∗∗Email address: fd.zgg@forsythe.stanford.edu

Abstract

This study is motivated by problems of molecular sequence comparison for multiple marker arrays with correlated distributions. In this paper, the model assumes two (or more) kinds of markers, say Markers A and B, distributed along the DNA sequence. The two primary conditions of interest are (i) many of Marker B (say ≥ m) occur, and (ii) few of Marker B (say ≤ l) occur. We title these the conditional r-scan models, and inquire on the extent to which Marker A clusters or is over-dispersed in regions satisfying condition (i) or (ii). Limiting distributions for the extremal r-scan statistics from the A array satisfying conditions (i) and (ii) are derived by extending the Chen-Stein Poisson approximation method.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2000 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supported in part by NIH Grant 2R01HG00335-11 and 5R01GM10452-35.

References

Arratia, R., Goldstein, L., and Gordon, L. (1989). Two moments suffice for Poisson approximations: the Chen–Stein method. Ann. Prob. 17, 925.Google Scholar
Barbour, A. D., Holst, L., and Janson, S. (1992). Poisson Approximation. Oxford Scientific Publications.CrossRefGoogle Scholar
Chen, L. H. Y. (1975). Poisson approximation for dependent trials. Ann. Prob. 3, 534545.Google Scholar
Dembo, A., and Karlin, S. (1992). Poisson approximations for r-scan processes. Ann. Appl. Prob. 2, 329357.Google Scholar
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.Google Scholar
Gerstein, M. (1997). A structure census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. J. Molec. Biol. 274, 562576.Google Scholar
Karlin, S. (1968). Total Positivity. Stanford University Press.Google Scholar
Karlin, S., and Brendel, V. (1992). Chance and statistical significance in protein and DNA sequence analysis. Science 257, 3949.Google Scholar
Karlin, S., and Cardon, L. R. (1994). Computational DNA sequence analysis. Ann. Rev. Microbiol. 48, 619654.Google Scholar
Karlin, S., and Macken, C. (1991). Some statistical problems in the assessment of inhomogeneities of DNA sequence data. J. Amer. Statist. Assoc. 86, 2633.Google Scholar
Karlin, S., Mrázek, J., and Campbell, A. (1996). Frequent oligonucleotides and peptides of the Haemophilus influenzae genome. Nucleic Acids Res. 24, 42634272.Google Scholar
Karlin, S., and Taylor, H. M. (1975). A First Course in Stochastic Processes. 2nd edn. Academic Press, New York.Google Scholar
Naus, J. I. (1979). An indexed bibliography of clusters clumps and coincidences. Int. Statist. Rev. 47, 4778.Google Scholar
Naus, J. I. (1982). Approximation of distributions of scan statistics. J. Amer. Statist. Assoc. 77, 177183.Google Scholar
Reinert, G., and Schbath, S. (1998). Compound Poisson and Poisson approximations for occurrences of multiple words in Markov chains. J. Comput. Biol. 5, 223–53.Google Scholar
Waterman, M. S. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, New York.Google Scholar