The question this chapter raises is whether it is always possible to collect useful validity information for new vocabulary tests which claim to target specific aspects of knowledge. The difficulties entailed in collecting such information may help explain why some tests emerge into general use before such data is available. The quest for objective measuring tools which can quantify lexical knowledge has spawned a plethora of L2 vocabulary tests. Several of these are mentioned elsewhere in this volume, for example the Vocabulary Levels Test (Nation, 1983), the Lexical Frequency Profile (Laufer and Nation, 1995), P-Lex (Meara and Bell, 2001), X-Lex (Meara and Milton, 2003b), and various applications of Type-Token Ratios (TTR) (for example Arnaud, 1984; Laufer, 1991). The practical nature of tests such as these – they tend to be relatively quick to administer and mark, and produce a numerical score – makes them extremely attractive to EFL teachers who are often required to assess the proficiency or progress of large numbers of students. As a consequence of this, we often see tests being used to make judgements about the language level of non-native speakers and/or about the lexical richness of a text (see the concerns of van Hout and Vermeer (Chapter 5), this volume) before we have conclusive proof that the tests themselves produce reliable and valid results.