Web Search

Serge Abiteboul; Ioana Manolescu; Philippe Rigaux; Marie-Christine Rousset; Pierre Senellart

doi:10.1017/CBO9780511998225.014

13 - Web Search

from Part 3 - Building Web Scale Applications

Published online by Cambridge University Press: 05 June 2012

Serge Abiteboul ,

Ioana Manolescu ,

Philippe Rigaux ,

Marie-Christine Rousset and

Pierre Senellart

Show author details

Serge Abiteboul: Affiliation:
INRIA Saclay – Île-de- France
Ioana Manolescu: Affiliation:
INRIA Saclay – Île-de- France
Philippe Rigaux: Affiliation:
Conservatoire Nationale des Arts et Metiers, Paris
Marie-Christine Rousset: Affiliation:
Université de Grenoble, France
Pierre Senellart: Affiliation:
Télécom ParisTech, France

Book contents

Get access

Summary

With a constantly increasing size of dozens of billions of freely accessible documents, one of the major issues raised by the World Wide Web is that of searching in an effective and efficient way through these documents to find those that best suit a user's need. The purpose of this chapter is to describe the techniques that are at the core of today's search engines (such as Google, Bing, or Exalead), that is, mostly keyword search in very large collections of text documents. We also briefly touch upon other techniques and research issues that may be of importance in next-generation search engines.

This chapter is organized as follows. In Section 13.1, we briefly recall the Web and the languages and protocols it relies upon. Most of these topics have already been covered earlier in the book, and their introduction here is mostly intended to make the present chapter self-contained. We then present in Section 13.2 the techniques that can be used to retrieve pages from the Web, that is, to crawl it, and to extract text tokens from them. First-generation search engines, exemplified by Altavista, mostly relied on the classical information retrieval (IR) techniques, applied to text documents, that are described in Section 13.3. The advent of the Web, and more generally the steady growth of documents collections managed by institutions of all kinds, has led to extensions of these techniques. We address scalability issues in Section 13.3.3, with focus on centralized indexing. Distributed approaches are investigated in Chapter 14.

Type: Chapter
Information: Web Data Management , pp. 247 - 286

DOI: https://doi.org/10.1017/CBO9780511998225.014 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

13 - Web Search

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive