Hostname: page-component-8448b6f56d-cfpbc Total loading time: 0 Render date: 2024-04-25T04:53:00.049Z Has data issue: false hasContentIssue false

Shortest common superstrings of random strings

Published online by Cambridge University Press:  14 July 2016

Kenneth S. Alexander*
Affiliation:
University of Southern California
*
Postal address: Department of Mathematics, University of Southern California, 1042 West 36th Place, DRB 155, Los Angeles, California 90089–1113, USA.

Abstract

Given a finite collection of strings of letters from a fixed alphabet, it is of interest, in the contexts of data compression and DNA sequencing, to find the length of the shortest string which contains each of the given strings as a consecutive substring. In order to analyze the average behavior of the optimal superstring length, substrings of specified lengths are considered with the letters selected independently at random. An asymptotic expression is obtained for the savings from compression, i.e. the difference between the uncompressed (concatenated) length and the optimal superstring length.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1996 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by NSF grant DMS-9206139.

References

[1] Arratia, R. and Waterman, M. S. (1985) Critical phenomena in sequence matching. Ann. Prob. 13, 12361249.Google Scholar
[2] Blum, A., Jiang, T., Li, M., Tromp, J. and Yannakakis, M. (1991) Linear approximation of shortest superstrings. Proc. 23rd ACM Symp. on Theory of Computing. pp. 328336.Google Scholar
[3] Huang, X. (1992) A contig assembly program based on sensitive detection of fragment overlaps. Genomics 14, 1825.CrossRefGoogle ScholarPubMed
[4] Lander, E. S. and Waterman, M. S. (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231239.Google Scholar
[5] Peltola, H., Söderlund, H., Tarhio, J. and Ukkonen, E. (1983) Algorithms for some string matching problems arising in molecular genetics. In Information Processing 83. ed. Mason, R. E. A. North-Holland, Amsterdam. pp. 5364.Google Scholar
[6] Tarhio, J. and Ukkonen, E. (1986) A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comp. Sci. 57, 131145.Google Scholar
[7] Turner, J. (1989) Approximation algorithms for the shortest common superstring problem. Inf. Comput. 83, 120.Google Scholar
[8] Waterman, M. S. (1994) Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London.Google Scholar