An Introduction to Distributed Systems

Serge Abiteboul; Ioana Manolescu; Philippe Rigaux; Marie-Christine Rousset; Pierre Senellart

doi:10.1017/CBO9780511998225.015

14 - An Introduction to Distributed Systems

from Part 3 - Building Web Scale Applications

Published online by Cambridge University Press: 05 June 2012

Serge Abiteboul ,

Ioana Manolescu ,

Philippe Rigaux ,

Marie-Christine Rousset and

Pierre Senellart

Show author details

Serge Abiteboul: Affiliation:
INRIA Saclay – Île-de- France
Ioana Manolescu: Affiliation:
INRIA Saclay – Île-de- France
Philippe Rigaux: Affiliation:
Conservatoire Nationale des Arts et Metiers, Paris
Marie-Christine Rousset: Affiliation:
Université de Grenoble, France
Pierre Senellart: Affiliation:
Télécom ParisTech, France

Book contents

Get access

Summary

This chapter is an introduction to very large data management in distributed systems. Here, “very large” means a context where gigabytes (1,000 MB = 109 bytes) constitute the unit size for measuring data volumes. Terabytes (1012 bytes) are commonly encountered, and many Web companies and scientific or financial institutions must deal with petabytes (1015 bytes). In a near future, we can expect exabytes (1018 bytes) data sets, with the world-wide digital universe roughly estimated (in 2010) as about 1 zetabytes (1021 bytes).

Distribution is the key for handling very large data sets. Distribution is necessary (but not sufficient) to bring scalability (i.e., the means of maintaining stable performance for steadily growing data collections by adding new resources to the system). However, distribution brings a number of technical problems that make the design and implementation of distributed storage, indexing, and computing a delicate issue. A prominent concern is the risk of failure. In an environment that consists of hundreds or thousands of computers (a common setting for large Web companies), it becomes very common to face the failure of components (hardware, network, local systems, disks), and the system must be ready to cope with it at any moment.

Our presentation covers principles and techniques that recently emerged to handle Web-scale data sets. We examine the extension of traditional storage and indexing methods to large-scale distributed settings. We describe techniques to efficiently process point queries that aim at retrieving a particular object.

Type: Chapter
Information: Web Data Management , pp. 287 - 309

DOI: https://doi.org/10.1017/CBO9780511998225.015 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

14 - An Introduction to Distributed Systems

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive