Speech recognition

doi:10.1017/CBO9781139136310.005

5 - Speech recognition

Published online by Cambridge University Press: 05 July 2012

Thomas Hain and

Edited by

Jean Carletta and

Thomas Hain: Affiliation:
University of Sheffield, UK
Philip N. Garner: Affiliation:
Idiap Research Institute, Martigny, Switzerland
Steve Renals: Affiliation:
University of Edinburgh
Hervé Bourlard: Affiliation:
Idiap Research Institute
Jean Carletta: Affiliation:
University of Edinburgh
Andrei Popescu-Belis: Affiliation:
Idiap Research Institute, Martigny, Switzerland

Book contents

Get access

Summary

General overview

Meetings are a rich resource of information that, in practice, is mostly untouched by any form of information processing. Even now it is rare that meetings are recorded, and fewer are then annotated for access purposes. Examples of the latter only include meetings held in parliaments, courts, hospitals, banks, etc., where a record is required for reasons of decision tracking or legal obligations. In these cases a labor-intensive manual transcription of the spoken words is produced. Giving much wider access to the rich content is the main aim of the AMI consortium projects, and there are now many examples of interest in that access – through the release of commercial hardware and software services. Especially with the advent of high-quality telephone and videoconferencing systems the opportunity to record, process, recognize, and categorize the interactions in meetings is recognized even by skeptics of speech and language processing technology.

Of course meetings are an audio-visual experience by nature and humans make extensive use of visual and other sensory information. To illustrate the rich landscape of information is the purpose of this book and many applications can be implemented even without looking at the spoken word. However, it is still verbal communication that forms the backbone of most meetings, and accounts for the bulk of the information transferred between participants. Hence automatic speech recognition (ASR) is key to access the information exchanged and is the most important part required for most higher level processing.

Type: Chapter
Information: Multimodal Signal Processing
Human Interactions in Meetings
, pp. 56 - 83

DOI: https://doi.org/10.1017/CBO9781139136310.005 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

5 - Speech recognition

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive