Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 A model of distributed computations
- 3 Logical time
- 4 Global state and snapshot recording algorithms
- 5 Terminology and basic algorithms
- 6 Message ordering and group communication
- 7 Termination detection
- 8 Reasoning with knowledge
- 9 Distributed mutual exclusion algorithms
- 10 Deadlock detection in distributed systems
- 11 Global predicate detection
- 12 Distributed shared memory
- 13 Checkpointing and rollback recovery
- 14 Consensus and agreement algorithms
- 15 Failure detectors
- 16 Authentication in distributed systems
- 17 Self-stabilization
- 18 Peer-to-peer computing and overlay graphs
- Index
4 - Global state and snapshot recording algorithms
Published online by Cambridge University Press: 05 June 2012
- Frontmatter
- Contents
- Preface
- 1 Introduction
- 2 A model of distributed computations
- 3 Logical time
- 4 Global state and snapshot recording algorithms
- 5 Terminology and basic algorithms
- 6 Message ordering and group communication
- 7 Termination detection
- 8 Reasoning with knowledge
- 9 Distributed mutual exclusion algorithms
- 10 Deadlock detection in distributed systems
- 11 Global predicate detection
- 12 Distributed shared memory
- 13 Checkpointing and rollback recovery
- 14 Consensus and agreement algorithms
- 15 Failure detectors
- 16 Authentication in distributed systems
- 17 Self-stabilization
- 18 Peer-to-peer computing and overlay graphs
- Index
Summary
Recording the global state of a distributed system on-the-fly is an important paradigm when one is interested in analyzing, testing, or verifying properties associated with distributed executions. Unfortunately, the lack of both a globally shared memory and a global clock in a distributed system, added to the fact that message transfer delays in these systems are finite but unpredictable, makes this problem non-trivial.
This chapter first defines consistent global states (also called consistent snapshots) and discusses issues which have to be addressed to compute consistent distributed snapshots. Then several algorithms to determine on-the-fly such snapshots are presented for several types of networks (according to the properties of their communication channels, namely, FIFO, non-FIFO, and causal delivery).
Introduction
A distributed computing system consists of spatially separated processes that do not share a common memory and communicate asynchronously with each other by message passing over communication channels. Each component of a distributed system has a local state. The state of a process is characterized by the state of its local memory and a history of its activity. The state of a channel is characterized by the set of messages sent along the channel less the messages received along the channel. The global state of a distributed system is a collection of the local states of its components.
Recording the global state of a distributed system is an important paradigm and it finds applications in several aspects of distributed system design.
- Type
- Chapter
- Information
- Distributed ComputingPrinciples, Algorithms, and Systems, pp. 87 - 125Publisher: Cambridge University PressPrint publication year: 2008