Article contents
On the relation between types and tokens in literary text
Published online by Cambridge University Press: 14 July 2016
Abstract
The ratio of the number Xn of different words (types) in a text of length n (token) words to n has received considerable attention in the literature of statistical linguistics. The present note contains two stochastic models for Xn based on an inhomogeneous discrete Markov process of the pure birth type where the transition probabilities take certain forms depending only upon n. These models are then tested against data obtained from the plays of William Shakespeare.
Keywords
- Type
- Research Papers
- Information
- Copyright
- Copyright © Applied Probability Trust 1972
References
Bailey, N. T. J. (1964) The Elements of Stochastic Processes.
John Wiley and Sons, New York.Google Scholar
Carroll, J. B. (1968) Word-frequency studies and the log-normal distribution. In Proc. Conf. on Language and Language Behavior (Zale, G. M., Ed.) Appleton-Century-Croft, New York.Google Scholar
Good, I. J. (1953) The population frequencies of species and the estimation of population parameters. Biometrika
40, 237–264.CrossRefGoogle Scholar
Good, I. J. (1969) Statistics of Language. Encyclopedia of Linguistics, Information, and Control. (Meetham, A. R., Ed.) Pergamon Press, London, pp. 567–581.Google Scholar
Herdan, G. (1966) The Advanced Theory of Language as Choice and Chance. Springer-Verlag, New York.CrossRefGoogle Scholar
Kucera, H. and Francis, W. N. (1967) Computational Analysis of Present-Day American English.
Providence, R. I.
Google Scholar
Muller, C. (1965) Lexical distribution reconsidered: the Waring-Herdan formula. First appeared in French in Cahiers de Lexicologie, Vol. 6 and reprinted in Statistics and Style, Doležel, L. and Bailey, R. W., Editors. American Elsevier, New York
1969.Google Scholar
Müller, W. (1969) Gedanken zur automatischen Analyse von Normen und Normabweichungen. Muttersprache
79, 301–304.Google Scholar
Simon, H. A. (1955) On a class of skew distribution functions. Biometrika
42, 425–439.CrossRefGoogle Scholar
Simon, H. A. (1960) Some further notes on a class of skew distribution functions. Information and Control
3, 80–88.CrossRefGoogle Scholar
Spevack, M. (1968) A Complete and Systematic Concordance to the Works of Shakespeare, vols. I-VI. George Olms, Hildesheim.Google Scholar
Thomson, G. H. and Thompson, J. R. (1915) Outlines of a method for the quantitative analysis of writing vocabularies. British J. Psychology
8, 52–69.Google Scholar
Yule, G. U. (1944) The Statistical Study of Literary Style.
Cambridge University Press.Google Scholar
- 17
- Cited by