Introduction: lexical richness measures
This chapter is intended to build upon the previous chapter's consideration of the way we try to measure and assess the nature of the productive lexicon by looking at these measures in more detail. It will address in particular the concern that measures which aim to reveal the richness of the lexicon used to produce a text, suffer generally from reliability and/or validity problems. In this chapter, therefore, we intend to discuss these problems and then explore the possibility of combining three elements that seem to work relatively successfully: (a) lexical (frequency) layers, (b) alternative type-token functions, and (c) (re)sampling procedures.
Several lexical richness measures are being used in research on language acquisition, the most popular one being the Type-Token Ratio. This measure is provided, for instance, by CLAN (Computerised Language Analysis Program), which comprises the analytic tools for data formatted according to the CHAT (Codes for Human Analysis of Transcripts) guidelines. Both CLAN and CHAT belong to CHILDES (Child Language Data Exchange System), the successful databank on (first) language acquisition (cf. MacWhinney, 2000a and b). The Type-Token Ratio is calculated for data files containing transcribed utterances. In language acquisition research it often means transcriptions of spontaneous speech, including unguided narratives, guided retellings or speech elicited by a series of successive pictures.