Book contents
- Frontmatter
- Contents
- Introduction
- Part 1 Modeling Web Data
- Part 2 Web Data Semantics and Integration
- Part 3 Building Web Scale Applications
- 13 Web Search
- 14 An Introduction to Distributed Systems
- 15 Distributed Access Structures
- 16 Distributed Computing with MapReduce and Pig
- 17 Putting into Practice: Full-Text Indexing with Lucene
- 18 Putting into Practice: Recommendation Methodologies
- 19 Putting into Practice: Large-Scale Data Management with Hadoop
- 20 Putting into Practice: CouchDB, a JSON Semistructured Database
- Bibliography
- Index
20 - Putting into Practice: CouchDB, a JSON Semistructured Database
from Part 3 - Building Web Scale Applications
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Introduction
- Part 1 Modeling Web Data
- Part 2 Web Data Semantics and Integration
- Part 3 Building Web Scale Applications
- 13 Web Search
- 14 An Introduction to Distributed Systems
- 15 Distributed Access Structures
- 16 Distributed Computing with MapReduce and Pig
- 17 Putting into Practice: Full-Text Indexing with Lucene
- 18 Putting into Practice: Recommendation Methodologies
- 19 Putting into Practice: Large-Scale Data Management with Hadoop
- 20 Putting into Practice: CouchDB, a JSON Semistructured Database
- Bibliography
- Index
Summary
This chapter proposes exercises and projects based on CouchDB, a recent database system which relies on many of the concepts presented so far in this book. In brief:
CouchDB adopts a semistructured data model, based on the JSON (JavaScript Object Notation) format; JSON offers a lightweight alternative to XML;
A database in CouchDB is schema-less: the structure of the JSON documents may vary at will depending on their specific features;
In order to cope with the absence of constraint that constitutes the counterpart of this flexibility, CouchDB proposes an original approach, based on structured materialized views that can be produced from document collections;
Views are defined with the MapReduce paradigm, allowing both a parallel computation and incremental maintenance of their content;
Finally, the system aspects of CouchDB illustrate most of the distributed data management techniques covered in the last part of the present book: distribution based on consistent hashing, support for data replication and reconciliation, horizontal scalability, parallel computing, and so forth.
CouchDB is representative of the emergence of so-called key-value store systems that give up many features of the relational model, including schema, structured querying, and consistency guarantees, in favor of flexible data representation, simplicity and scalability. It illustrates the “No[tOnly]SQL” trend with an original and consistent approach to large-scale management of “documents” viewed as autonomous, rich pieces of information that can be managed independently, in contrast with relational databases, which take the form of a rich graph of interrelated flat tuples.
- Type
- Chapter
- Information
- Web Data Management , pp. 400 - 420Publisher: Cambridge University PressPrint publication year: 2011