Sampling techniques for audio-visual tracking and head pose estimation

doi:10.1017/CBO9781139136310.006

6 - Sampling techniques for audio-visual tracking and head pose estimation

Published online by Cambridge University Press: 05 July 2012

Jean-Marc Odobez and

Edited by

Jean Carletta and

Jean-Marc Odobez: Affiliation:
Idiap Research Institute, Martigny, Switzerland
Oswald Lanz: Affiliation:
FBK-IRST, Trento, Italy
Steve Renals: Affiliation:
University of Edinburgh
Hervé Bourlard: Affiliation:
Idiap Research Institute
Jean Carletta: Affiliation:
University of Edinburgh
Andrei Popescu-Belis: Affiliation:
Idiap Research Institute, Martigny, Switzerland

Book contents

Get access

Summary

Introduction

Analyzing the behaviors of people in smart environment using multimodal sensors requires to answer a set of typical questions: who are the people? where are they? what activities are they doing? when? with whom are they interacting? and how are they interacting? In this view, locating people or their faces and characterizing them (e.g. extracting their body or head orientation) allows us to address the first two questions (who and where), and is usually one of the first steps before applying higher-level multimodal scene analysis algorithms that address the other questions. In the last ten years, tracking algorithms have experienced considerable progress, particularly in indoor environment or for specific applications, where they have reached a maturity allowing their deployment in real systems and applications. Nevertheless, there are still several issues that can make tracking difficult: background clutter and potentially small object size; complex shape, appearance, and motion, and their changes over time or across camera views; inaccurate/rough scene calibration or inconsistent camera calibration between views for 3D tracking; real-time processing requirements. In what follows, we discuss some important aspects of tracking algorithms, and introduce the remaining chapter content.

Scenarios and Set-ups. Scenarios and application needs strongly influence the considered physical environment, and therefore the set-up (where, how many, and what type of sensors are used) and choice of tracking method. A first set of scenarios commonly involves the tracking of people in the so-called smart spaces (Singh et al., 2006).

Type: Chapter
Information: Multimodal Signal Processing
Human Interactions in Meetings
, pp. 84 - 102

DOI: https://doi.org/10.1017/CBO9781139136310.006 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book contents

6 - Sampling techniques for audio-visual tracking and head pose estimation

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive