4 - Suffix arrays
Published online by Cambridge University Press: 03 October 2009
Summary
This chapter addresses the problem of searching a fixed text. The associated data structure described here is known as the Suffix Array of the text. The searching procedure is presented first for a list of strings in Sections 4.1 and 4.2, and then adapted to a fixed text in the remaining sections.
The first three sections consider the question of searching a list of strings memorized in a table. The table is supposed to be fixed and can thus be preprocessed to speed up later accesses to it. The search for a string in a lexicon or a dictionary that can be stored in central memory of a computer is an application of this question.
We describe how to lexicographically sort the strings of the list (in maximal time proportional to the total length of the strings) in order to be able to apply a binary search algorithm. Actually, the sorting is not entirely sufficient to get an efficient search. The precomputation and the utilization of the longest common prefixes between the strings of the list are extra elements that make the technique very efficient. Searching for a string of length m in a list of n strings takes O(m + log n) time.
- Type
- Chapter
- Information
- Algorithms on Strings , pp. 146 - 176Publisher: Cambridge University PressPrint publication year: 2007