Skip to main content Accessibility help
  • Print publication year: 2008
  • Online publication date: January 2011

8 - Data Structures for Strings


Up to now we always assumed that the data items are of constant size, and key values can be compared in constant time, so essentially that they are numbers. A very important class of objects for which these assumptions fail are strings. In real applications, text processing is more important than the processing of numbers, and text fragments have a length; they are not elementary objects that the computer can process in a single step. So we need different structures for strings than for numeric keys; especially the balanced binary search trees, our most useful previous tool, require a key comparison in each node and are quite inefficient as dictionary structure for strings. Also, for strings we will ask different questions. Even though strings can be ordered lexicographically, this order does not reflect the similarity of strings, for two strings that differ in the first character only are closer related than two strings that differ from the third to the tenth character. Thus, range searching makes little sense for strings.

The concept of strings is not entirely uniform and therefore requires some attention. We have an underlying alphabet A, for example, the ASCII codes, and strings are sequences of characters from this alphabet. But for use in the computer, we need an important further information: how to recognize where the string ends. There are two solutions for this: we can have an explicit termination character, which is added at the end of each string, but may not occur within the string, or we can store together with each string its length.