Introduction: The Problem, and a Possible Solution
Natural Language Data
Linguists appear to be in an enviable position among scientific disciplines. The lifeblood of science is data, and unlike, say, glaciologists, who can only collect primary material for their research in remote and generally rather inhospitable parts of the planet, or particle physicists, who require access to massive, extremely expensive, and sometimes erratic machines – with demand for access to such machines far exceeding supply – linguists are literally surrounded by the kind of data that make up the target of their investigations. It's true that field linguists need informants at a considerable distance from where they themselves live, and experimental linguists often need laboratories with elaborate and sophisticated equipment. But for syntacticians – linguists who investigate the structure of sentences, a large fraction of whom (possibly a majority) study sentences in their own respective languages – matters are as convenient as they could possibly be. Syntacticians have intuitive access to all of the sentences made available by their own knowledge of their language, as well as the speech (and reactions) of their fellows in constant use around them, and ready-made corpora in the form of written materials and electronic records, many of which are available for searches based on word sequences (Google, for example, is a valuable source of data for both syntacticians and morphologists). Learning how to take advantage of this vast pool of readily available data is a major component of syntacticians’ training.
In a sense, of course, the true data of syntax are not strings of words themselves, but judgments about the status of those strings of words. The syntactician's primary responsibility is to give an account of how it is that certain strings of words have the status of sentences, while other do not, and still others have a kind of shadowy intermediate status – not bad enough to be outright rubbish, but not good enough to pass completely unnoticed in conversation as utterly and tediously normal. For example, consider the status of the three word strings in (1):
(1) a. I asked Robin to leave the room.
b. I requested Robin to leave the room.
c. I inquired (of) Robin to leave the room.