8 - Approximate patterns
Published online by Cambridge University Press: 03 October 2009
Summary
In this chapter, we are interested in the approximate search for fixed strings. Several notions of approximation on strings are considered: jokers, differences, and mismatches.
A joker is a symbol meant to represent all the letters of the alphabet. The solutions to the problem of searching a text for a pattern containing jokers use specific methods that are described in Section 8.1.
More generally, approximate pattern matching consists in locating all the occurrences of factors inside a text y that are similar to a string x. It consists in producing the positions of the factors of y that are at distance at most k from x, for a given natural integer k. We assume in the rest that k < |x| ≤ |y|. We consider two distances for measuring the approximation: the edit distance and the Hamming distance.
The edit distance between two strings u and v, that are not necessarily of the same length, is the minimum cost of a sequence of elementary edit operations between these two strings (see Section 7.1). The method at the basis of approximate pattern matching is a natural extension of the alignment method by dynamic programming of Chapter 7. It can be improved by using a restricted notion of distance obtained by considering the minimum number of edit operations rather than the sum of their costs.
- Type
- Chapter
- Information
- Algorithms on Strings , pp. 287 - 331Publisher: Cambridge University PressPrint publication year: 2007