Published online by Cambridge University Press: 13 July 2017
In this chapter we focus on systematization, analysis, and discussion of recent trends in machine learning methods for Social signal processing (SSP) (Pentland, 2007). Because social signaling is often of central importance to subconscious decision making that affects everyday tasks (e.g., decisions about risks and rewards, resource utilization, or interpersonal relationships), the need for automated understanding of social signals by computers is a task of paramount importance. Machine learning has played a prominent role in the advancement of SSP over the past decade. This is, in part, due to the exponential increase of data availability that served as a catalyst for the adoption of a new data-driven direction in affective computing. With the difficulty of exact modeling of latent and complex physical processes that underpin social signals, the data has long emerged as the means to circumvent or supplement expert- or physics-based models, such as the deformable musculoskeletal models of the human body, face, or hands and its movement, neuro-dynamical models of cognitive perception, or the models of the human vocal production. This trend parallels the role and success of machine learning in related areas, such as computer vision (c.f., Poppe, 2010; Wright et al., 2010; Grauman & Leibe, 2011) or audio, speech and language processing (c.f., Deng & Li, 2013), that serve as the core tools for analytic SSP tasks. Rather than emphasize the exhaustive coverage of the many approaches to data-driven SSP, which can be found in excellent surveys (Vinciarelli, Pantic, & Bourlard, 2009; Vinciarelli et al., 2012), we seek to present the methods in the context of current modeling challenges. In particular, we identify and discuss two major modeling directions:
• Simultaneous modeling of social signals and context, and
• Modeling of annotators and the data annotation process.
Context plays a crucial role in understanding the human behavioral signals that can otherwise be easily misinterpreted. For instance, a smile can be a display of politeness, contentedness, joy, irony, empathy, or a greeting, depending on the context. Yet, most SSP methods to date focus on the simpler problem of detecting a smile as a prototypical and self-contained signal.