Skip to main content Accessibility help
×
Home
  • Print publication year: 2010
  • Online publication date: July 2014

7 - Indexing, Search, and Retrieval of Vectors

Summary

As we have seen in the previous chapters, it is common to map the relevant features of the objects in a database onto the dimensions of a vector space and perform nearest neighbor or range search queries in this space (Figure 7.1). The nearest neighbor query returns a predetermined number of database objects that are closest to the query object in the feature space. The range query, on the other hand, identifies and returns those objects whose distance from the query object is less than a provided threshold.

A naive way of executing these queries is to have a lookup file containing the vector representations of all the objects in the database and scan this file for the required matches, pruning those objects that do not satisfy the search condition. Although this approach might be feasible for small databases where all objects fit into the main memory, for large databases, a full scan of the database quickly becomes infeasible. Instead, multimedia database systems use specialized indexing techniques to help speed up search by pruning the irrelevant portions of the space and focusing on the parts that are likely to satisfy the search predicate (Figure 7.2).

Index structures that support range or nearest neighbor searches in general lay the data out on disk in sorted order (Figure 7.3(a)). Given a pointer to a data element on disk, this enables constraining further reads on the disk to only those disk pages that are in immediate neighborhood of this data element (Figure 7.3(b)).