Introduction
Multimedia here refers to images, audio and video. Multimedia information retrieval means the process of searching for and finding image, audio and video documents through a multimedia search engine. This chapter will take image retrieval as an example to discuss interaction models and interfaces developed for multimedia information retrieval and to illustrate information-seeking behaviour in relation to image search.
Current image search engines are mainly based on keyword annotations or information extracted from the image's context (e.g. web page text). This approach has three limitations. First, the manual annotation of images requires significant effort and thus may not be practical for large image collections. Second, as the complexity of the image increases, capturing image content by text alone becomes increasingly more difficult. Finally, it relies on the user being able to articulate and enter a text description of their information need using the same vocabulary (and language) as the text annotations.
In seeking to overcome these limitations, content-based image retrieval was proposed in the early 1990s (Rui et al., 1998). It searches using images rather than keywords as the query (discussed in more detail in Chapter 13, ‘Multimedia: information representation and access’). Content-based image retrieval systems have since been primarily used for image searches on collections with limited annotations, or for image searches where annotation is not required, such as trademark search (Eakins, Riley and Edwards, 2003). More recently, Google has launched a new application called ‘Google Goggles’ for Google Android mobile phones, which is a content-based search application and allows people to search for more information about a famous landmark or work of art simply by submitting a photo of that object (Jamaal, 2010).
A basic content-based image retrieval system interprets the content (e.g. colour, texture, shape and so on) of the images in a query and in the collection, calculates the similarity between the images in the query and the object images in the collection and ranks the object images in the collection according to their degree of relevance to the images in the query (Marques and Furht, 2002).