Book contents
- Frontmatter
- Contents
- Preface
- 1 Combinatorial Discrepancy
- 2 Upper Bound Techniques
- 3 Lower Bound Techniques
- 4 Sampling
- 5 Geometric Searching
- 6 Complexity Lower Bounds
- 7 Convex Hulls and Voronoi Diagrams
- 8 Linear Programming and Extensions
- 9 Pseudorandomness
- 10 Communication Complexity
- 11 Minimum Spanning Trees
- A Probability Theory
- B Harmonic Analysis
- C Convex Geometry
- Bibliography
- Index
4 - Sampling
Published online by Cambridge University Press: 05 October 2013
- Frontmatter
- Contents
- Preface
- 1 Combinatorial Discrepancy
- 2 Upper Bound Techniques
- 3 Lower Bound Techniques
- 4 Sampling
- 5 Geometric Searching
- 6 Complexity Lower Bounds
- 7 Convex Hulls and Voronoi Diagrams
- 8 Linear Programming and Extensions
- 9 Pseudorandomness
- 10 Communication Complexity
- 11 Minimum Spanning Trees
- A Probability Theory
- B Harmonic Analysis
- C Convex Geometry
- Bibliography
- Index
Summary
This chapter is about extracting small representative samples from large data sets. In the process we develop a complete computational theory of geometric sampling, with an eye toward the derandomization applications that will be discussed in later chapters. It is difficult to overestimate the impact that this theory has had in computational geometry in the 1990's.
The combinatorial discrepancy of a set system indicates how well, relative to its constituent subsets, we can sample the ground set by selecting about half of it. It is natural to ask what happens for different sample sizes. At one extreme, we might wonder how well we can sample a set if we are allowed to pick only a constant number of elements. For example, given a finite collection of points in the plane, is it possible to choose a subset of constant size, such that any disk that encloses at least one percent of the points also includes at least one sample point? Surprisingly, the answer is yes.
In fact, something even stronger and stranger is true: Suppose that we want to estimate how many people live within 10 miles of a hospital in a given country. We can do this by sampling the population carefully, answering the question for the sample, and then scaling up appropriately. What is amazing is that, for a given relative error, the same sample size works just as well whether the country is Switzerland or China! Furthermore, we can change metrics and even lift the problem into higher dimensional space, and this still remains true.
- Type
- Chapter
- Information
- The Discrepancy MethodRandomness and Complexity, pp. 169 - 202Publisher: Cambridge University PressPrint publication year: 2000