In attempts to implement probabilistic survey designs in areas of reduced surface visibility, archaeologists have turned to shovel testing or Test Pit Sampling (TPS). Characteristically TPS involves excavation of small, systematically spaced test pits within larger survey units as a method of searching for archaeological materials that would otherwise go undiscovered. While TPS has been the subject of considerable study most studies have been theoretical in nature. As a result, while the characteristics of TPS are understood generally, it is not known how well the method functions in known archaeological contexts. This article describes the results of research directed at estimating the reliability and validity of the test pit method when carried out on known archaeological sites under varying conditions of artifact density and spatial clustering. Split-half correlations and logistic regressions show that TPS is reliable in the sense that it produces replicable results, but is biased against discovery of small, low-density sites, especially when these sites exhibit high degrees of spatial clustering of artifacts. A model relating TPS to regional survey in general is presented and a means of estimating potential biases of the method is illustrated.