On the relation between types and tokens in literary text

Barron Brainerd

doi:10.2307/3212322

On the relation between types and tokens in literary text

Published online by Cambridge University Press: 14 July 2016

Barron Brainerd

Show author details

Barron Brainerd*: Affiliation:
University of Toronto

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The ratio of the number Xn of different words (types) in a text of length n (token) words to n has received considerable attention in the literature of statistical linguistics. The present note contains two stochastic models for Xn based on an inhomogeneous discrete Markov process of the pure birth type where the transition probabilities take certain forms depending only upon n. These models are then tested against data obtained from the plays of William Shakespeare.

Keywords

TYPE-COUNT TOKEN-COUNT SHAKESPEARE INHOMOGENEOUS DISCRETE MARKOV PROCESS VOCABULARY OF AN AUTHOR LITERARY TEXT

Type: Research Papers
Information: Journal of Applied Probability , Volume 9 , Issue 3 , June 1972 , pp. 507 - 518

DOI: https://doi.org/10.2307/3212322 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1972

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bailey, N. T. J. (1964) The Elements of Stochastic Processes. John Wiley and Sons, New York.Google Scholar

Carroll, J. B. (1968) Word-frequency studies and the log-normal distribution. In Proc. Conf. on Language and Language Behavior (Zale, G. M., Ed.) Appleton-Century-Croft, New York.Google Scholar

Good, I. J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40, 237–264.CrossRef Google Scholar

Good, I. J. (1969) Statistics of Language. Encyclopedia of Linguistics, Information, and Control. (Meetham, A. R., Ed.) Pergamon Press, London, pp. 567–581.Google Scholar

Herdan, G. (1966) The Advanced Theory of Language as Choice and Chance. Springer-Verlag, New York.CrossRef Google Scholar

Kucera, H. and Francis, W. N. (1967) Computational Analysis of Present-Day American English. Providence, R. I. Google Scholar

Muller, C. (1965) Lexical distribution reconsidered: the Waring-Herdan formula. First appeared in French in Cahiers de Lexicologie, Vol. 6 and reprinted in Statistics and Style, Doležel, L. and Bailey, R. W., Editors. American Elsevier, New York 1969.Google Scholar

Müller, W. (1969) Gedanken zur automatischen Analyse von Normen und Normabweichungen. Muttersprache 79, 301–304.Google Scholar

Simon, H. A. (1955) On a class of skew distribution functions. Biometrika 42, 425–439.CrossRef Google Scholar

Simon, H. A. (1960) Some further notes on a class of skew distribution functions. Information and Control 3, 80–88.CrossRef Google Scholar

Spevack, M. (1968) A Complete and Systematic Concordance to the Works of Shakespeare, vols. I-VI. George Olms, Hildesheim.Google Scholar

Thomson, G. H. and Thompson, J. R. (1915) Outlines of a method for the quantitative analysis of writing vocabularies. British J. Psychology 8, 52–69.Google Scholar

Yule, G. U. (1944) The Statistical Study of Literary Style. Cambridge University Press.Google Scholar

Article contents

On the relation between types and tokens in literary text

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests