Hostname: page-component-848d4c4894-5nwft Total loading time: 0 Render date: 2024-05-29T12:09:39.659Z Has data issue: false hasContentIssue false

On the relation between types and tokens in literary text

Published online by Cambridge University Press:  14 July 2016

Barron Brainerd*
Affiliation:
University of Toronto

Abstract

The ratio of the number Xn of different words (types) in a text of length n (token) words to n has received considerable attention in the literature of statistical linguistics. The present note contains two stochastic models for Xn based on an inhomogeneous discrete Markov process of the pure birth type where the transition probabilities take certain forms depending only upon n. These models are then tested against data obtained from the plays of William Shakespeare.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1972 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bailey, N. T. J. (1964) The Elements of Stochastic Processes. John Wiley and Sons, New York.Google Scholar
Carroll, J. B. (1968) Word-frequency studies and the log-normal distribution. In Proc. Conf. on Language and Language Behavior (Zale, G. M., Ed.) Appleton-Century-Croft, New York.Google Scholar
Good, I. J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika 40, 237264.CrossRefGoogle Scholar
Good, I. J. (1969) Statistics of Language. Encyclopedia of Linguistics, Information, and Control. (Meetham, A. R., Ed.) Pergamon Press, London, pp. 567581.Google Scholar
Herdan, G. (1966) The Advanced Theory of Language as Choice and Chance. Springer-Verlag, New York.CrossRefGoogle Scholar
Kucera, H. and Francis, W. N. (1967) Computational Analysis of Present-Day American English. Providence, R. I. Google Scholar
Muller, C. (1965) Lexical distribution reconsidered: the Waring-Herdan formula. First appeared in French in Cahiers de Lexicologie, Vol. 6 and reprinted in Statistics and Style, Doležel, L. and Bailey, R. W., Editors. American Elsevier, New York 1969.Google Scholar
Müller, W. (1969) Gedanken zur automatischen Analyse von Normen und Normabweichungen. Muttersprache 79, 301304.Google Scholar
Simon, H. A. (1955) On a class of skew distribution functions. Biometrika 42, 425439.CrossRefGoogle Scholar
Simon, H. A. (1960) Some further notes on a class of skew distribution functions. Information and Control 3, 8088.CrossRefGoogle Scholar
Spevack, M. (1968) A Complete and Systematic Concordance to the Works of Shakespeare, vols. I-VI. George Olms, Hildesheim.Google Scholar
Thomson, G. H. and Thompson, J. R. (1915) Outlines of a method for the quantitative analysis of writing vocabularies. British J. Psychology 8, 5269.Google Scholar
Yule, G. U. (1944) The Statistical Study of Literary Style. Cambridge University Press.Google Scholar