Machine Learning Methods for Social Signal Processing

doi:10.1017/9781316676202.018

18 - Machine Learning Methods for Social Signal Processing

from Part II - Machine Analysis of Social Signals

Published online by Cambridge University Press: 13 July 2017

Ognjen Rudovic ,

Mihalis A. Nicolaou and

Vladimir Pavlovic

Edited by

Judee K. Burgoon ,

Nadia Magnenat-Thalmann ,

Maja Pantic and

Alessandro Vinciarelli

Show author details

Ognjen Rudovic: Affiliation:
Imperial College London
Mihalis A. Nicolaou: Affiliation:
Imperial College London
Vladimir Pavlovic: Affiliation:
Rutgers University
Judee K. Burgoon: Affiliation:
University of Arizona
Nadia Magnenat-Thalmann: Affiliation:
Université de Genève
Maja Pantic: Affiliation:
Imperial College London
Alessandro Vinciarelli: Affiliation:
University of Glasgow

Book contents

Get access

Summary

Introduction

In this chapter we focus on systematization, analysis, and discussion of recent trends in machine learning methods for Social signal processing (SSP) (Pentland, 2007). Because social signaling is often of central importance to subconscious decision making that affects everyday tasks (e.g., decisions about risks and rewards, resource utilization, or interpersonal relationships), the need for automated understanding of social signals by computers is a task of paramount importance. Machine learning has played a prominent role in the advancement of SSP over the past decade. This is, in part, due to the exponential increase of data availability that served as a catalyst for the adoption of a new data-driven direction in affective computing. With the difficulty of exact modeling of latent and complex physical processes that underpin social signals, the data has long emerged as the means to circumvent or supplement expert- or physics-based models, such as the deformable musculoskeletal models of the human body, face, or hands and its movement, neuro-dynamical models of cognitive perception, or the models of the human vocal production. This trend parallels the role and success of machine learning in related areas, such as computer vision (c.f., Poppe, 2010; Wright et al., 2010; Grauman & Leibe, 2011) or audio, speech and language processing (c.f., Deng & Li, 2013), that serve as the core tools for analytic SSP tasks. Rather than emphasize the exhaustive coverage of the many approaches to data-driven SSP, which can be found in excellent surveys (Vinciarelli, Pantic, & Bourlard, 2009; Vinciarelli et al., 2012), we seek to present the methods in the context of current modeling challenges. In particular, we identify and discuss two major modeling directions:

• Simultaneous modeling of social signals and context, and
• Modeling of annotators and the data annotation process.

Context plays a crucial role in understanding the human behavioral signals that can otherwise be easily misinterpreted. For instance, a smile can be a display of politeness, contentedness, joy, irony, empathy, or a greeting, depending on the context. Yet, most SSP methods to date focus on the simpler problem of detecting a smile as a prototypical and self-contained signal.

Type: Chapter
Information: Social Signal Processing , pp. 234 - 254

DOI: https://doi.org/10.1017/9781316676202.018 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Amin, M. A., Afzulpurkar, N. V., Dailey, M. N., Esichaikul, V. & Batanov, D. N. (2005). Fuzzy-CMean determines the principle component pairs to estimate the degree of emotion from facial expressions. In 2nd International Conference on Fuzzy Systems and Knowledge Discovery (pp. 484–493), Changsa, China.

Bach, F. R. & Jordan, M. I. (2005). A probabilistic interpretation of canonical correlation analysis Technical Report 688, Department of Statistics, University of California.

Bartlett, M., Littlewort, G., Frank, M., et al. (2005). Recognizing facial expression: Machine learning and application to spontaneous behavior. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 568–573), San Diego, CA

Bartlett, M., Littlewort, G., Frank, M., et al. (2006). Fully automatic facial action recognition in spontaneous behavior. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (pp. 223–230), Southampton, UK.

Bazzo, J. & Lamar, M. (2004). Recognizing facial actions using Gabor wavelets with neutral face average difference. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition(pp. 505–510), Seoul.

Black, M. J. & Yacoob, Y. (1997). Recognizing facial expressions in image sequences using local parameterized models of image motion.International Journal of Computer Vision, 25, 23–48.Google Scholar

Cai, D., He, X., & Han, J. (2007). Spectral regression for efficient regularized subspace learning. In Proceedings of IEEE International Conference on Computer Vision (pp. 1–8), Brazil.

Chang, K.-Y., Liu, T.-L. & Lai, S.-H. (2009). Learning partially observed hidden conditional random fields for facial expression recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 533–540, Miami, FL.

Chew, S., Lucey, P., Lucey, S., et al. (2012). In the pursuit of effective affective computing: The relationship between features and registration.IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(4), 1006–1016.Google Scholar

Chu, W. & Ghahramani, Z. (2005). Gaussian processes for ordinal regression.Journal of Machine Learning Research, 6, 1019–1041.Google Scholar

Chu, W. & Keerthi, S. S. (2005). New approaches to support vector ordinal regression. In Proceedings of the 22nd International Conference on Machine Learning (pp. 145–152), Bonn, Germany.

Chu, W.-S., De la Torre, F., & Cohn, J. (2013). Selective transfer machine for personalized facial action unit detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 3515–3522), Portland, OR.

Cohen, I., Sebe, N., Chen, L., Garg, A., & Huang, T. S. (2003). Facial expression recognition from video sequences: Temporal and static modelling.Computer Vision and Image Understanding, 92(1–2), 160–187.Google Scholar

Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). Beyond emotion archetypes: Databases for emotion modelling using neural networks, Neural networks, 18(4), 371–388.Google Scholar

Cowie, R., Douglas-Cowie, E., Savvidou, S., et al. (2000). “FEELTRACE”: An instrument for recording perceived emotion in real time. In Proceedings of the ISCA Workshop on Speech and Emotion (pp. 19–24), Belfast.

Dai P, Mausam, & Weld, D. S. (2010). Decision-theoretic control of crowd-sourced workflows. In Proceedings of the 24th National Conference on Artificial Intelligence (pp. 1168–1174), Atlanta, GA.

Dai P, Mausam, & Weld, D. S. (2011). Artificial intelligence for artificial artificial intelligence. In Proceedings of 25th AAAI Conference on Artificial Intelligence (1153–1159), San Francisco.

Delannoy, J. & McDonald, J. (2008). Automatic estimation of the dynamics of facial expression using a three-level model of intensity. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (pp. 1–6), Amsterdam.

De Leeuw, J. (2006). Principal component analysis of binary data by iterated singular value decomposition.Computational Statistics and Data Analysis, 50(1), 21–39.Google Scholar

Deng, L. & Li, X. (2013). Machine learning paradigms for speech recognition: An overview.IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.Google Scholar

Der Maaten, L. V. & Hendriks, E. (2012). Action unit classification using active appearance models and conditional random fields.Cognitive Processing, 13(2), 507–518.Google Scholar

Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases.Speech Communication, 40(1), 33–60.Google Scholar

Ekman, P., Friesen, W., & Hager, J. (2002). Facial Action Coding System (FACS): Manual. Salt Lake City, UT: A Human Face.

Ekman, P., Friesen, W. V., & Press, C. P. (1975). Pictures of Facial Affect. Palo Alto, CA: Consulting Psychologists Press.

Fasel, B. & Luettin, J. (2000). Recognition of asymmetric facial action unit activities and intensities. In Proceedings of 15th International Conference on Pattern Recognition (pp. 110–1103), Barcelona, Spain.

Gholami, B., Haddad, W. M., & Tannenbaum, A. R. (2009). Agitation and pain assessment using digital imaging. In Proceedings of International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 2176–2179), Minneapolis, MN.

Grauman, K. & Leibe, B. (2011). Visual object recognition.Synthesis Lectures on Artificial Intelligence and Machine Learning, 5(2), 1–181.Google Scholar

Gunes, H. & Piccardi, M. (2009). Automatic temporal segment detection and affect recognition from face and body display, IEEE Transactions on Systems, Man, and Cybernetics, 39(1), 64– 84.Google Scholar

Gunes, H., Piccardi, M., & Pantic, M. (2008). From the lab to the real world: Affect recognition using multiple cues and modalities. In J, Or (Ed.), Affective Computing [e-book]. www.intechopen.com/books/affective_computing.

Hamid, J., Meaney, C., Crowcroft, N., et al. (2011). Potential risk factors associated with human encephalitis: Application of canonical correlation analysis.BMC Medical Research Methodology, 11(1), 1–10.Google Scholar

Hamm, J., Kohler, C. G., Gur, R. C., & Verma, R. (2011). Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders.Journal of Neuroscience Methods, 200(2), 237–256.Google Scholar

Hammal, Z. & Cohn, J. F. (2012). Automatic detection of pain intensity. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (pp. 47–52), Santa Monica, CA.

He, X. & Niyogi, P. (2004). Locality preserving projections. In Proceedings of Neural Information Processing Systems (vol. 16) Vancouver, Canada.

Hess, U., Blairy, S., & Kleck, R. (1997). The intensity of emotional facial expressions and decoding accuracy.Journal of Nonverbal Behavior, 21(4), 241–257.Google Scholar

Hu, C., Chang, Y., Feris, R., & Turk, M. (2004). Manifold based analysis of facial expression. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop (p. 81).

Jain, S., Hu, C., & Aggarwal, J. (2011). Facial expression recognition with temporal modeling of shapes. In IEEE International Conference on Computer Vision Workshops (pp. 1642–1649), Barcelona, Spain.

Jeni, L. A., Girard, J.M., Cohn, J. F., & Torre, F. D. L. (2013). Continuous AU intensity estimation using localized, sparse facial feature space. IEEE International Conference on Automatic Face and Gesture Recognition(pp. 1–7).

Kaltwang, S., Rudovic, O., & Pantic, M. (2012). Continuous pain intensity estimation from facial expressions.Lecture Notes in Computer Science ISVC, 7432, 368–377.Google Scholar

Kapoor, A., Qi, Y. A., & Picard, R. W. (2003). Fully automatic upper facial action recognition. In Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures (pp. 195–202).

Khademi, M., Manzuri-Shalmani, M. T., Kiapour, M. H., & Kiaei, A. A. (2010). Recognizing combinations of facial action units with different intensity using a mixture of hidden Markov models and neural network. In Proceedings of the 9th International Conference on Multiple Classifier Systems(pp. 304–313)

Kim, M. & Pavlovic, V. (2010). Structured output ordinal regression for dynamic facial emotion intensity prediction. In Proceedings of 11th European Conference on Computer Vision (pp. 649–662), Heraklion, Crete.

Kimura, S. & Yachida, M. (1997). Facial expression recognition and its degree estimation. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 295–300), Puerto Rico.

Klami, A. & Kaski, S. (2008). Probabilistic approach to detecting dependencies between data sets.Neurocomputing, 72(1), 39–46.Google Scholar

Koelstra, S., Pantic, M., & Patras, I. (2010). A dynamic texture based approach to recognition of facial actions and their temporal models.IEEE Transactions on Pattern Analysis And Machine Intelligence, 32, 1940–1954.Google Scholar

Lam, L. & Suen, S. (1997). Application of majority voting to pattern recognition: An analysis of its behavior and performance.IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 27(5), 553–568.Google Scholar

Lee, C. S. & Elgammal, A. (2005). Facial expression analysis using nonlinear decomposable generative models. In Proceedings of IEEE International Workshops on Analysis and Modeling of Faces and Gestures(pp. 17–31).

Lee, K. K. & Xu, Y. (2003). Real-time estimation of facial expression intensity. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 2567–2572), Taipei.

Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2011). A survey of multilinear subspace learning for tensor data Pattern Recognition, 44(7), 1540–1551.Google Scholar

Lucey, P., Cohn, J., Prkachin, K., Solomon, P., & Matthews, I. (2011). Painful data: The UNBCMcMaster shoulder pain expression archive database. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (pp. 57–64), Santa Barbara, CA.

Mahoor, M., Cadavid, S., Messinger, D., & Cohn, J. (2009). A framework for automated measurement of the intensity of non-posed facial action units. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop (pp. 74–8), Miami, FL.

Mariooryad, S. & Busso, C. (2013). Analysis and compensation of the reaction lag of evaluators in continuous emotional annotations. In Proceedings of Humaine Association Conference on Affective Computing and Intelligent Interaction (pp. 97–108), Switzerland.

Mavadati, S., Mahoor, M., Bartlett, K., Trinh, P., & Cohn, J. (2013). DISFA: A spontaneous facial action intensity database.IEEE Transactions on Affective Computing, 4(2), 151–160.Google Scholar

McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent.IEEE Transactions on Affective Computing, 3(1), 5–17.Google Scholar

Metallinou, A., Katsamanis, A., Wang, Y., & Narayanan, S. (2011). Tracking changes in continuous emotion states using body language and prosodic cues. In Proceedings of IEEE International Conference Acoustics, Speech and Signal Processing (pp. 2288–2291), Prague.

Metallinou, A., Lee, C.-C., Busso, C., Carnicke, S., & Narayanan, S. (2010). The USC CreativeIT database: A multimodal database of theatrical improvisation. In Proceedings of the Multimodal Corpora Workshop: Advances in Capturing, Coding and Analyzing Multimodality (pp. 64–68), Malta.

Nicolaou, M. A., Gunes, H., & Pantic, M. (2010). Automatic segmentation of spontaneous data using dimensional labels from multiple coders. In Proceedings of LREC International Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Valletta, Malta.

Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space.IEEE Transactions on Affective Computing, 2(2), 92–105.Google Scholar

Nicolaou, M. A., Pavlovic, V., & Pantic, M. (2012). Dynamic probabilistic CCA for analysis of affective behaviour. In Proceedings of the 12th European Conference on Computer Vision (pp. 98–111), Florence, Italy.

Niitsuma, H. & Okada, T. (2005). Covariance and PCA for categorical variables. In T, Ho, D, Cheung, & Liu, H. (Eds), Advances in Knowledge Discovery and Data Mining (pp. 523–528). Berlin: Springer.

Otsuka, T. & Ohya, J. (1997). Recognizing multiple persons' facial expressions using HMMbased on automatic extraction of significant frames from image sequences. In Proceedings of International Conference on Image Processing (pp. 546–549), Santa Barbara, CA.

Padgett, C. & Cottrell, G. W. (1996). Representing face images for emotion classification. In Proceedings 10th Annual Conference on Neural Information Processing Systems (pp. 894–900), Denver, CO.

Pan, S. J. & Yang, Q. (2010). A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.Google Scholar

Pantic, M. & Bartlett, M. (2007). Machine analysis of facial expressions. In K, Delac & M, Grgic (Eds), Face Recognition [e-book]. http://www.intechopen.com/books/face_recognition.

Pantic, M. & Patras, I. (2005). Detecting facial actions and their temporal segments in nearly frontal-view face image sequences. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics (pp. 3358–3363), Waikoloa, HI.

Pantic, M. & Patras, I. (2006). Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 36(2), 433–449.Google Scholar

Pantic, M. & Rothkrantz, L. J. (2004). Facial action recognition for facial expression analysis from static face images.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 34(3), 1449–1461.Google Scholar

Pentland, A. (2007). Social signal processing.IEEE Signal Processing Magazine, 24(4), 108–111.Google Scholar

Poppe, R. (2010). A survey on vision-based human action recognition.Image and Vision Computing, 28(6), 976–990.Google Scholar

Posner, J., Russell, J. A., & Peterson, B. S. (2005). The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology.Development and Psychopathology, 17(3), 715–734.Google Scholar

Quinn, A. J. & Bederson, B. B. (2011). Human computation: a survey and taxonomy of a growing field. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Request Permissions (pp. 1403–1412), Vancouver.

Raykar, V. C., Yu, S., Zhao, L. H., et al. (2009). Supervised learning from multiple experts: Whom to trust when everyone lies a bit. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 889–896), Montreal.

Raykar, V. C., Yu, S., Zhao, L. H., et al. (2010). Learning from crowds.Journal of Machine Learning Research, 99, 1297–1322.Google Scholar

Reilly, J., Ghent, J., & McDonald, J. (2006). Investigating the dynamics of facial expression.Lecture Notes in Computer Science, 4292, 334–343.Google Scholar

Rudovic, O., Pavlovic, V., & Pantic, M. (2012a). Kernel conditional ordinal random fields for temporal segmentation of facial action units. Proceedings of 12th European Conference on Computer Vision (pp. 260–269), Florence, Italy.

Rudovic, O., Pavlovic, V., & Pantic, M. (2012b). Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 2634–2641), Providence, RI.

Rudovic, O., Pavlovic, V., & Pantic, M. (2013a). Automatic pain intensity estimation with heteroscedastic conditional ordinal random fields. In Proceedings of 9th International Symposium on Advances in Visual Computing (pp. 234–243), Rethymnon, Crete.

Rudovic, O., Pavlovic, V., & Pantic, M. (2013b). Context-sensitive conditional ordinal random fields for facial action intensity estimation. In Proceedings of IEEE International Conference on Computer Vision Workshops (pp. 492–499), Sydney.

Ruta, D. & Gabrys, B. (2005). Classifier selection for majority voting.Information Fusion, 6(1), 63–81.Google Scholar

Savrana, A., Sankur, B., & Bilge, M. (2012). Regression-based intensity estimation of facial action units, Image and Vision Computing, 30(10), 774–784.Google Scholar

Shan, C. (2007). Inferring facial and body language. PhD thesis, University of London.

Shan, C., Gong, S., & McOwan, P. W. (2005). Appearance manifold of facial expression, Lecture Notes in Computer Science, 3766, 221–230.Google Scholar

Shan, C., Gong, S., & McOwan, P. W. (2006). Dynamic facial expression recognition using a Bayesian temporal manifold model. In Proceedings of the British Machine Vision Conference (pp. 297–306), Edinburgh.

Shan, C., Gong, S., & McOwan, P.W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study.Image and Vision Computing, 27(6), 803–816.Google Scholar

Shang, L. & Chan, K.-P. (2009). Nonparametric discriminant HMM and application to facial expression recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2090–2096.

Simon, T., Nguyen, M. H., De la Torre, F., & Cohn, J. F. (2010). Action unit detection with segment-based SVMs. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2737–2744), San Francisco.

Tian, Y.-L. (2004). Evaluation of face resolution for expression analysis. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Washington, DC.

Tong, Y., Liao, W., & Ji, Q. (2007). Facial action unit recognition by exploiting their dynamic and semantic relationships, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1683–1699.Google Scholar

Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In O, Maimon & L, Rokach (Eds), Data Mining and Knowledge Discovery Handbook (pp. 667–685). Boston: Springer.

Tucker, L. R. (1958). An inter-battery method of factor analysis.Psychometrika, 23(2), 111–136.Google Scholar

Valstar, M. F. & Pantic, M. (2012). Fully automatic recognition of the temporal phases of facial actions.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 42, 28–43.Google Scholar

Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain.Image and Vision Computing, 27(12), 1743–1759.Google Scholar

Vinciarelli, A., Pantic, M., Heylen, D., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing.IEEE Transactions on Affective Computing, 3(1), 69–87.Google Scholar

Wang, S., Quattoni, A., Morency, L.-P., Demirdjian, D., & Darrell, T. (2006). Hidden conditional random fields for gesture recognition In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 1097–1104), New York.

Wöllmer, M., Eyben, F., Reiter, S., et al. (2008). Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of InterSpeech (pp. 597–600), Brisbane, Australia.

Wright, J., Ma, Y., Mairal, J., et al. (2010). Sparse representation for computer vision and pattern recognition.Proceedings of the IEEE, 98(6), 1031–1044.Google Scholar

Yan, Y., Rosales, R., Fung, G., & Dy, J. (2012). Modeling multiple annotator expertise in the semi-supervised learning scenario. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA

Yang, P., Liu, Q., & Metaxas, D. N. (2009a). Boosting encoded dynamic features for facial expression recognition Pattern Recognition Letters, 2, 132–139.Google Scholar

Yang, P., Liu, Q., & Metaxas, D. N. (2009b). Rankboost with L1 regularization for facial expression recognition and intensity estimation. In Proceedings of IEEE International Conference on Computer Vision (pp. 1018–1025), Kyoto, Japan.

Zhang, Y. & Ji, Q. (2005). Active and dynamic information fusion for facial expression understanding from image sequences.IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 699–714.Google Scholar

Book contents

18 - Machine Learning Methods for Social Signal Processing

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive