Book contents
- Frontmatter
- Content
- Acknowledgements
- 1 Introduction
- 2 What is a thesaurus?
- 3 Tools for subject access and retrieval
- 4 What a thesaurus is used for
- 5 Why use a thesaurus?
- 6 Types of thesaurus
- 7 The format of a thesaurus
- 8 Building a thesaurus 1: vocabulary collection
- 9 Vocabulary control 1: selection of terms
- 10 Vocabulary control 2: form of entry
- 11 Building a thesaurus 2: term extraction from document titles
- 12 Building a thesaurus 3: vocabulary analysis
- 13 The thesaural relationships
- 14 Building a thesaurus 4: introducing internal structure
- 15 Building a thesaurus 5: imposing hierarchy
- 16 Building a thesaurus 6: compound subjects and citation order
- 17 Building a thesaurus 7: conversion of the taxonomy to alphabetical format
- 18 Building a thesaurus 8: creating the thesaurus records
- 19 Managing and maintaining the thesaurus: thesaurus software
- 20 Conclusion
- Glossary
- Bibliography
- Appendix 1 Sample titles for thesaurus vocabulary
- Appendix 2 Sample terms for the thesaurus
- Appendix 3 Facets at stage 1 of analysis
- Appendix 4 Facets at stage 2 of analysis
- Appendix 5 Completed systematic display
- Appendix 6 Thesaurus entries for sample page
- Index
11 - Building a thesaurus 2: term extraction from document titles
Published online by Cambridge University Press: 09 June 2018
- Frontmatter
- Content
- Acknowledgements
- 1 Introduction
- 2 What is a thesaurus?
- 3 Tools for subject access and retrieval
- 4 What a thesaurus is used for
- 5 Why use a thesaurus?
- 6 Types of thesaurus
- 7 The format of a thesaurus
- 8 Building a thesaurus 1: vocabulary collection
- 9 Vocabulary control 1: selection of terms
- 10 Vocabulary control 2: form of entry
- 11 Building a thesaurus 2: term extraction from document titles
- 12 Building a thesaurus 3: vocabulary analysis
- 13 The thesaural relationships
- 14 Building a thesaurus 4: introducing internal structure
- 15 Building a thesaurus 5: imposing hierarchy
- 16 Building a thesaurus 6: compound subjects and citation order
- 17 Building a thesaurus 7: conversion of the taxonomy to alphabetical format
- 18 Building a thesaurus 8: creating the thesaurus records
- 19 Managing and maintaining the thesaurus: thesaurus software
- 20 Conclusion
- Glossary
- Bibliography
- Appendix 1 Sample titles for thesaurus vocabulary
- Appendix 2 Sample terms for the thesaurus
- Appendix 3 Facets at stage 1 of analysis
- Appendix 4 Facets at stage 2 of analysis
- Appendix 5 Completed systematic display
- Appendix 6 Thesaurus entries for sample page
- Index
Summary
Following the systematic searches on catalogues and databases, we now have about 150 titles which will form the basis of the working vocabulary. Because these were selected carefully to avoid duplication of terms where possible, this should provide us with about 400–500 terms. This is really as big a vocabulary as one can comfortably manage in the initial stages, and there are quite enough terms to establish a sound and reliable structure for the thesaurus. There will probably be gaps in the terminology, but it is more efficient to fill these in at a later stage if necessary. In the majority of cases, far fewer terms than this can provide a reasonable structure for the thesaurus, so if you are dealing with a small, specialist vocabulary you can manage with around 100 terms as a starting point.
The titles must now be analysed to identify relevant terms. I find it easiest to do this in a rather rough and ready way, using a list of the titles, and cutting and pasting the relevant terms into another document. Some level of vocabulary control can be imposed as you go along and, at the end, the extracted terms are easily sorted using the A–Z sort facility of a word processing package.
Identification of significant terms
Let's start by picking out the key concepts in each title. We will look at some examples of titles from our list, starting with a very straightforward one:
Cat overpopulation in the United States
Here there are three important terms:
cat – overpopulation – United States
A similar straightforward example is:
Preference of domestic rabbits for grass or coarse mix
feeds
Domestic rabbits – grass – coarse mix feeds
You will notice that I did not select ‘preference’ as a term to be used. Vague or general terms of this kind are not generally used in indexing, and the purpose of the exercise is to identify significant terms. If you find it hard to decide on what is significant or not, you may think about the kinds of terms that an end-user is likely to search for. A useful tip is to look for nouns or noun phrases first, and then for any significant verbs. You will remember from the previous chapter that most thesaurus terms fall into these two categories.
- Type
- Chapter
- Information
- Essential Thesaurus Construction , pp. 99 - 106Publisher: FacetPrint publication year: 2006