Data without generalisation is just gossip.(Pirsig 1991: 55, in Chambers 2003: xix)
So you want to investigate the language used by a group of people. One of the first questions you might ask yourself is: Who do I collect these data from? A crucial element of empirical linguistic work is to choose not only what type of data to collect (e.g., naturally occurring data, interview data, questionnaire data, experimental data; see Part I of this volume), but also which people to target for data collection. The most reliable method for finding out about the language use of a particular group of people would be to collect linguistic information from every single person in the population, which in the social sciences refers to all members of the community. Obviously, except for very small populations, this method is rather impractical, expensive, and time-consuming. Hence, most researchers only target “some people in the group in such a way that their responses and characteristics reflect those of the group from which they are drawn . . . This is the principle of sampling” (De Vaus 2001: 60). The subgroup of people that reflects the population as a whole (in terms of their social and linguistic characteristics), and therefore lends itself to generalizations above and beyond the scope of the study, is called a representative sample. The question we need to ask as linguists is: To what extent are the findings reported on the basis of a subsample representative of the linguistic habits of a certain population or group?