Skip to main content Accessibility help
×
Hostname: page-component-5c6d5d7d68-wpx84 Total loading time: 0 Render date: 2024-08-07T15:26:59.402Z Has data issue: false hasContentIssue false

7 - Text Analysis and Mining

Published online by Cambridge University Press:  14 August 2020

Get access

Summary

As has been seen throughout this book, data is not just numbers but also text, and the internet gives access to vast quantities of text online. This varies from the individual keywords that have been entered into search engines or are associated with bibliographic records, to huge quantities of unstructured text in the form of web pages, blog posts, microblogging updates and articles in online databases. This chapter considers some of the ways this data may be analysed for insights.

Following a brief discussion of how text analysis may be applied by library and information professionals, the chapter is broadly split into two halves. The first half considers natural language processing and the analysis of chunks of text, discussing machine learning, sentiment analysis and topic analysis. The second half considers techniques related to keywords or n-grams; it considers term frequency and burst detection. The two parts are not unrelated; for example keywords may have been extracted through natural language processing or created independently as part of a classification process, whether formal (e.g. applying subject terms) or informal (e.g. tagging).

Text analysis and mining, and information professionals

Of the three approaches to data science considered in this book (clustering and social network analysis, predictions and forecasting, and text analysis and mining), it is probably text analysis that has the most widespread potential for library and information professionals. While each of the approaches has its applications in the library, and there is plenty of overlap (e.g. predictive search suggestions), there are numerous potential applications for text analysis in the library because of the pivotal role of text in the history of the library and driven by the huge growth in online content and unstructured data.

The goal of creating a universal library, which contains all the books and useful information ever published, is ultimately unattainable, but great strides have been taken to achieve it. While the digital revolution has reduced the physical barriers to such a library bringing vast reductions in the costs of digitising, copying and storing publications, there are inevitably certain limitations that will never be overcome: history is filled with lost works; copyright holders may object to their works being deposited in such an online library; and there is a variety of grey literature and unpublished manuscripts that are not included in any library's collection policy.

Type
Chapter
Information
Publisher: Facet
Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Text Analysis and Mining
  • David Stuart
  • Book: Practical Data Science for Information Professionals
  • Online publication: 14 August 2020
  • Chapter DOI: https://doi.org/10.29085/9781783303465.008
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Text Analysis and Mining
  • David Stuart
  • Book: Practical Data Science for Information Professionals
  • Online publication: 14 August 2020
  • Chapter DOI: https://doi.org/10.29085/9781783303465.008
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Text Analysis and Mining
  • David Stuart
  • Book: Practical Data Science for Information Professionals
  • Online publication: 14 August 2020
  • Chapter DOI: https://doi.org/10.29085/9781783303465.008
Available formats
×