Shortest common superstrings of random strings

Kenneth S. Alexander

doi:10.2307/3214990

Shortest common superstrings of random strings

Part of: Combinatorial probability Discrete mathematics in relation to computer science

Published online by Cambridge University Press: 14 July 2016

Kenneth S. Alexander

Show author details

Kenneth S. Alexander*: Affiliation:
University of Southern California
*: ∗Postal address: Department of Mathematics, University of Southern California, 1042 West 36th Place, DRB 155, Los Angeles, California 90089–1113, USA.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Given a finite collection of strings of letters from a fixed alphabet, it is of interest, in the contexts of data compression and DNA sequencing, to find the length of the shortest string which contains each of the given strings as a consecutive substring. In order to analyze the average behavior of the optimal superstring length, substrings of specified lengths are considered with the letters selected independently at random. An asymptotic expression is obtained for the savings from compression, i.e. the difference between the uncompressed (concatenated) length and the optimal superstring length.

Keywords

SHORTEST COMMON SUPERSTRING ENTROPY PATTERN MATCHING DATA COMPRESSION DNA SEQUENCING

MSC classification

Primary: 60C05: Combinatorial probability 68R15: Combinatorics on words

Type: Research Papers
Information: Journal of Applied Probability , Volume 33 , Issue 4 , December 1996 , pp. 1112 - 1126

DOI: https://doi.org/10.2307/3214990 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1996

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by NSF grant DMS-9206139.

References

[1] Arratia, R. and Waterman, M. S. (1985) Critical phenomena in sequence matching. Ann. Prob. 13, 1236–1249.Google Scholar

[2] Blum, A., Jiang, T., Li, M., Tromp, J. and Yannakakis, M. (1991) Linear approximation of shortest superstrings. Proc. 23rd ACM Symp. on Theory of Computing. pp. 328–336.Google Scholar

[3] Huang, X. (1992) A contig assembly program based on sensitive detection of fragment overlaps. Genomics 14, 18–25.CrossRef Google Scholar PubMed

[4] Lander, E. S. and Waterman, M. S. (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239.Google Scholar

[5] Peltola, H., Söderlund, H., Tarhio, J. and Ukkonen, E. (1983) Algorithms for some string matching problems arising in molecular genetics. In Information Processing 83. ed. Mason, R. E. A. North-Holland, Amsterdam. pp. 53–64.Google Scholar

[6] Tarhio, J. and Ukkonen, E. (1986) A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comp. Sci. 57, 131–145.Google Scholar

[7] Turner, J. (1989) Approximation algorithms for the shortest common superstring problem. Inf. Comput. 83, 1–20.Google Scholar

[8] Waterman, M. S. (1994) Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London.Google Scholar

Article contents

Shortest common superstrings of random strings

Abstract

Keywords

MSC classification

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests