Skip to main content Accessibility help

The statistical analysis of direct repeats in nucleic acid sequences

  • Rakesh Shukla (a1) and R. C. Srivastava (a2)


Sequence symmetries in DNA and RNA are being discovered at an increasing rate. Conjectures and hypotheses are being proposed for their possible structural and functional role in the nucleic acid. In this paper a probability model is studied which evaluates the probabilities of various repeats occurring by chance alone. Expressions are derived for the mean and variance of the statistics employed. The central limit theorem for dependent trials is used to obtain the asymptotic distributions. An indication is given of how to use the model to search for various gene amplification events in the evolutionary history of the sequences.


Corresponding author

Postal address: Institute of Environmental Health, Division of Biostatistics, University of Cincinnati Medical Center, Wherry Hall (#183), Cincinnati, OH 45267, USA.
∗∗ Postal address: Department of Statistics, The Ohio State University, Columbus, OH 43210, USA.


Hide All
[1] Brezinski, D. P. (1975) Statistical significance of DNA sequence symmetries. Nature, London 253, 128130.
[2] Dykes, G., Bambara, R., Marians, K., and Wu, R. (1975) On the statistical significance of primary structural features found in DNA-protein interaction sites. Nucleic Acid Res. 2, 327345.
[3] Galas, D. J. (1978) On the symmetries of multi palindromic DNA sequences. J. Theoret. Biol. 72, 5773.
[4] Hoeffding, W. and Robbins, H. (1948) The central limit theorem for dependent random variables. Duke Math. J. 15, 773780.
[5] Wachter, R. De (1981) The number of repeats expected in random nucleic acid sequences and found in genes. J. Theoret. Biol. 91, 7198.


The statistical analysis of direct repeats in nucleic acid sequences

  • Rakesh Shukla (a1) and R. C. Srivastava (a2)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed