9 - Hash Tables
Published online by Cambridge University Press: 10 November 2016
Summary
Calculating instead of Comparing
In what we saw so far, the search time for an element could be reduced from O(n) to O(log n), which is a significant improvement. However, even logarithmic time might be, for certain applications, too high a price to pay. Consider, for instance, a very large computer program, with thousands of variables. We sometimes write quite complicated arithmetic expressions, involving many of these variables, and having to search for the values of each of them may impair the execution time.
We can get faster access by changing the approach altogether. In previous chapters, an element x was sought for by comparing its value to that of some elements stored in a list or a tree, until x was found or could be declared as missing. The new approach is to access the data structure at an address which is not the result of a comparison, but which can be calculated by means of x alone.
The basic idea is not new and is familiar to everybody. Many institutions, like schools, stores, hospitals, or clubs, used to assign membership numbers to their students, clients, patients, or members. If a group had n adherents, these numbers were 1 to n, and information about member i was stored in a table T at entry T[i]. With the growing number of such organizations, it became unpractical to manage different indices for a single person, and it was preferred to use some official, unique, identification number, which is provided in many countries to each of their inhabitants as ID or Social Security number. These numbers need to be larger and consist generally of 8 to 10 digits.
It is obviously not reasonable to allocate a table with a number of entries of the order of 1 billion, only to get direct access to some n elements by using their ID numbers as index, when n may be a few hundred. The 8- to 10-digit ID number should thus be shortened. This has been realized by bank tellers long ago, when they had to compile, by hand, a list of the check numbers for a batch of checks: they do not use the full check numbers, but rather only their, say, three rightmost digits, hoping that there are no clashes.
- Type
- Chapter
- Information
- Basic Concepts in Data Structures , pp. 127 - 151Publisher: Cambridge University PressPrint publication year: 2016