The process of tagging

Douglas Biber; Susan Conrad; Randi Reppen

doi:10.1017/CBO9780511804489.016

5 - The process of tagging

Published online by Cambridge University Press: 05 June 2012

Douglas Biber ,

Susan Conrad and

Randi Reppen

Show author details

Douglas Biber: Affiliation:
Northern Arizona University
Susan Conrad: Affiliation:
Iowa State University
Randi Reppen: Affiliation:
Northern Arizona University

Book contents

Get access

Summary

The analysis process

Most taggers (the programs that tag an uncoded corpus) make use of several kinds of information. First, they have dictionaries which list the category or categories that a particular word can belong to. Some words, such as the and a are not ambiguous; they can be automatically identified as the definite and indefinite article. Other words are ambiguous, such as deal, which can be a noun or a verb. Dictionaries can also identify fixed expressions (e.g., identifying the sequence and so forth as an adverb or such that as a subordinator). Finally, dictionaries can have lists of words that take certain grammatical patterns (e.g., the verbs or nouns that can control complement clauses).

For words that are ambiguous, many taggers make use of probabilistic information. This information is based on previous accurately tagged corpora (such as the LOB, for which all the grammatical tags were checked). The probabilistic information will tell the tagger how likely it is that a given word belongs to one class or another. Book, for instance, can be a verb or a noun, but it has a much higher probability of occurring as a noun.

Probabilities can also be applied to a sequence of tags. For example, to disambiguate respect in the phrase “in respect of the,” the tagger would consider the probability of a preposition-verb-preposition sequence versus a preposition-noun-preposition sequence.

Type: Chapter
Information: Corpus Linguistics
Investigating Language Structure and Use
, pp. 261 - 262

DOI: https://doi.org/10.1017/CBO9780511804489.016 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 1998

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

5 - The process of tagging

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive