Skip to main content Accessibility help
×
×
Home
  • Get access
    Check if you have access via personal or institutional login
  • Cited by 2
  • Cited by
    This chapter has been cited by the following publications. This list is generated based on data provided by CrossRef.

    Inoue, Koji Lala, Divesh Takanashi, Katsuya and Kawahara, Tatsuya 2018. Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue. APSIPA Transactions on Signal and Information Processing, Vol. 7, Issue. ,

    Augello, Agnese Dignum, Frank Gentile, Manuel Infantino, Ignazio Maniscalco, Umberto Pilato, Giovanni and Vella, Filippo 2018. A social practice oriented signs detection for human-humanoid interaction. Biologically Inspired Cognitive Architectures, Vol. 25, Issue. , p. 8.

    ×
  • Print publication year: 2017
  • Online publication date: July 2017

18 - Machine Learning Methods for Social Signal Processing

from Part II - Machine Analysis of Social Signals

Summary

Introduction

In this chapter we focus on systematization, analysis, and discussion of recent trends in machine learning methods for Social signal processing (SSP) (Pentland, 2007). Because social signaling is often of central importance to subconscious decision making that affects everyday tasks (e.g., decisions about risks and rewards, resource utilization, or interpersonal relationships), the need for automated understanding of social signals by computers is a task of paramount importance. Machine learning has played a prominent role in the advancement of SSP over the past decade. This is, in part, due to the exponential increase of data availability that served as a catalyst for the adoption of a new data-driven direction in affective computing. With the difficulty of exact modeling of latent and complex physical processes that underpin social signals, the data has long emerged as the means to circumvent or supplement expert- or physics-based models, such as the deformable musculoskeletal models of the human body, face, or hands and its movement, neuro-dynamical models of cognitive perception, or the models of the human vocal production. This trend parallels the role and success of machine learning in related areas, such as computer vision (c.f., Poppe, 2010; Wright et al., 2010; Grauman & Leibe, 2011) or audio, speech and language processing (c.f., Deng & Li, 2013), that serve as the core tools for analytic SSP tasks. Rather than emphasize the exhaustive coverage of the many approaches to data-driven SSP, which can be found in excellent surveys (Vinciarelli, Pantic, & Bourlard, 2009; Vinciarelli et al., 2012), we seek to present the methods in the context of current modeling challenges. In particular, we identify and discuss two major modeling directions:

• Simultaneous modeling of social signals and context, and

• Modeling of annotators and the data annotation process.

Context plays a crucial role in understanding the human behavioral signals that can otherwise be easily misinterpreted. For instance, a smile can be a display of politeness, contentedness, joy, irony, empathy, or a greeting, depending on the context. Yet, most SSP methods to date focus on the simpler problem of detecting a smile as a prototypical and self-contained signal.

Recommend this book

Email your librarian or administrator to recommend adding this book to your organisation's collection.

Social Signal Processing
  • Online ISBN: 9781316676202
  • Book DOI: https://doi.org/10.1017/9781316676202
Please enter your name
Please enter a valid email address
Who would you like to send this to *
×
Amin, M. A., Afzulpurkar, N. V., Dailey, M. N., Esichaikul, V. & Batanov, D. N. (2005). Fuzzy-CMean determines the principle component pairs to estimate the degree of emotion from facial expressions. In 2nd International Conference on Fuzzy Systems and Knowledge Discovery (pp. 484–493), Changsa, China.
Bach, F. R. & Jordan, M. I. (2005). A probabilistic interpretation of canonical correlation analysis Technical Report 688, Department of Statistics, University of California.
Bartlett, M., Littlewort, G., Frank, M., et al. (2005). Recognizing facial expression: Machine learning and application to spontaneous behavior. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 568–573), San Diego, CA
Bartlett, M., Littlewort, G., Frank, M., et al. (2006). Fully automatic facial action recognition in spontaneous behavior. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (pp. 223–230), Southampton, UK.
Bazzo, J. & Lamar, M. (2004). Recognizing facial actions using Gabor wavelets with neutral face average difference. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition(pp. 505–510), Seoul.
Black, M. J. & Yacoob, Y. (1997). Recognizing facial expressions in image sequences using local parameterized models of image motion.International Journal of Computer Vision, 25, 23–48.
Cai, D., He, X., & Han, J. (2007). Spectral regression for efficient regularized subspace learning. In Proceedings of IEEE International Conference on Computer Vision (pp. 1–8), Brazil.
Chang, K.-Y., Liu, T.-L. & Lai, S.-H. (2009). Learning partially observed hidden conditional random fields for facial expression recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 533–540, Miami, FL.
Chew, S., Lucey, P., Lucey, S., et al. (2012). In the pursuit of effective affective computing: The relationship between features and registration.IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 42(4), 1006–1016.
Chu, W. & Ghahramani, Z. (2005). Gaussian processes for ordinal regression.Journal of Machine Learning Research, 6, 1019–1041.
Chu, W. & Keerthi, S. S. (2005). New approaches to support vector ordinal regression. In Proceedings of the 22nd International Conference on Machine Learning (pp. 145–152), Bonn, Germany.
Chu, W.-S., De la Torre, F., & Cohn, J. (2013). Selective transfer machine for personalized facial action unit detection. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 3515–3522), Portland, OR.
Cohen, I., Sebe, N., Chen, L., Garg, A., & Huang, T. S. (2003). Facial expression recognition from video sequences: Temporal and static modelling.Computer Vision and Image Understanding, 92(1–2), 160–187.
Cowie, R., Douglas-Cowie, E., & Cox, C. (2005). Beyond emotion archetypes: Databases for emotion modelling using neural networks, Neural networks, 18(4), 371–388.
Cowie, R., Douglas-Cowie, E., Savvidou, S., et al. (2000). “FEELTRACE”: An instrument for recording perceived emotion in real time. In Proceedings of the ISCA Workshop on Speech and Emotion (pp. 19–24), Belfast.
Dai P, Mausam, & Weld, D. S. (2010). Decision-theoretic control of crowd-sourced workflows. In Proceedings of the 24th National Conference on Artificial Intelligence (pp. 1168–1174), Atlanta, GA.
Dai P, Mausam, & Weld, D. S. (2011). Artificial intelligence for artificial artificial intelligence. In Proceedings of 25th AAAI Conference on Artificial Intelligence (1153–1159), San Francisco.
Delannoy, J. & McDonald, J. (2008). Automatic estimation of the dynamics of facial expression using a three-level model of intensity. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (pp. 1–6), Amsterdam.
De Leeuw, J. (2006). Principal component analysis of binary data by iterated singular value decomposition.Computational Statistics and Data Analysis, 50(1), 21–39.
Deng, L. & Li, X. (2013). Machine learning paradigms for speech recognition: An overview.IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1060–1089.
Der Maaten, L. V. & Hendriks, E. (2012). Action unit classification using active appearance models and conditional random fields.Cognitive Processing, 13(2), 507–518.
Douglas-Cowie, E., Campbell, N., Cowie, R., & Roach, P. (2003). Emotional speech: Towards a new generation of databases.Speech Communication, 40(1), 33–60.
Ekman, P., Friesen, W., & Hager, J. (2002). Facial Action Coding System (FACS): Manual. Salt Lake City, UT: A Human Face.
Ekman, P., Friesen, W. V., & Press, C. P. (1975). Pictures of Facial Affect. Palo Alto, CA: Consulting Psychologists Press.
Fasel, B. & Luettin, J. (2000). Recognition of asymmetric facial action unit activities and intensities. In Proceedings of 15th International Conference on Pattern Recognition (pp. 110–1103), Barcelona, Spain.
Gholami, B., Haddad, W. M., & Tannenbaum, A. R. (2009). Agitation and pain assessment using digital imaging. In Proceedings of International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 2176–2179), Minneapolis, MN.
Grauman, K. & Leibe, B. (2011). Visual object recognition.Synthesis Lectures on Artificial Intelligence and Machine Learning, 5(2), 1–181.
Gunes, H. & Piccardi, M. (2009). Automatic temporal segment detection and affect recognition from face and body display, IEEE Transactions on Systems, Man, and Cybernetics, 39(1), 64– 84.
Gunes, H., Piccardi, M., & Pantic, M. (2008). From the lab to the real world: Affect recognition using multiple cues and modalities. In J, Or (Ed.), Affective Computing [e-book]. www.intechopen.com/books/affective_computing.
Hamid, J., Meaney, C., Crowcroft, N., et al. (2011). Potential risk factors associated with human encephalitis: Application of canonical correlation analysis.BMC Medical Research Methodology, 11(1), 1–10.
Hamm, J., Kohler, C. G., Gur, R. C., & Verma, R. (2011). Automated facial action coding system for dynamic analysis of facial expressions in neuropsychiatric disorders.Journal of Neuroscience Methods, 200(2), 237–256.
Hammal, Z. & Cohn, J. F. (2012). Automatic detection of pain intensity. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (pp. 47–52), Santa Monica, CA.
He, X. & Niyogi, P. (2004). Locality preserving projections. In Proceedings of Neural Information Processing Systems (vol. 16) Vancouver, Canada.
Hess, U., Blairy, S., & Kleck, R. (1997). The intensity of emotional facial expressions and decoding accuracy.Journal of Nonverbal Behavior, 21(4), 241–257.
Hu, C., Chang, Y., Feris, R., & Turk, M. (2004). Manifold based analysis of facial expression. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop (p. 81).
Jain, S., Hu, C., & Aggarwal, J. (2011). Facial expression recognition with temporal modeling of shapes. In IEEE International Conference on Computer Vision Workshops (pp. 1642–1649), Barcelona, Spain.
Jeni, L. A., Girard, J.M., Cohn, J. F., & Torre, F. D. L. (2013). Continuous AU intensity estimation using localized, sparse facial feature space. IEEE International Conference on Automatic Face and Gesture Recognition(pp. 1–7).
Kaltwang, S., Rudovic, O., & Pantic, M. (2012). Continuous pain intensity estimation from facial expressions.Lecture Notes in Computer Science ISVC, 7432, 368–377.
Kapoor, A., Qi, Y. A., & Picard, R. W. (2003). Fully automatic upper facial action recognition. In Proceedings of IEEE International Workshop on Analysis and Modeling of Faces and Gestures (pp. 195–202).
Khademi, M., Manzuri-Shalmani, M. T., Kiapour, M. H., & Kiaei, A. A. (2010). Recognizing combinations of facial action units with different intensity using a mixture of hidden Markov models and neural network. In Proceedings of the 9th International Conference on Multiple Classifier Systems(pp. 304–313)
Kim, M. & Pavlovic, V. (2010). Structured output ordinal regression for dynamic facial emotion intensity prediction. In Proceedings of 11th European Conference on Computer Vision (pp. 649–662), Heraklion, Crete.
Kimura, S. & Yachida, M. (1997). Facial expression recognition and its degree estimation. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 295–300), Puerto Rico.
Klami, A. & Kaski, S. (2008). Probabilistic approach to detecting dependencies between data sets.Neurocomputing, 72(1), 39–46.
Koelstra, S., Pantic, M., & Patras, I. (2010). A dynamic texture based approach to recognition of facial actions and their temporal models.IEEE Transactions on Pattern Analysis And Machine Intelligence, 32, 1940–1954.
Lam, L. & Suen, S. (1997). Application of majority voting to pattern recognition: An analysis of its behavior and performance.IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans, 27(5), 553–568.
Lee, C. S. & Elgammal, A. (2005). Facial expression analysis using nonlinear decomposable generative models. In Proceedings of IEEE International Workshops on Analysis and Modeling of Faces and Gestures(pp. 17–31).
Lee, K. K. & Xu, Y. (2003). Real-time estimation of facial expression intensity. In Proceedings of IEEE International Conference on Robotics and Automation (pp. 2567–2572), Taipei.
Lu, H., Plataniotis, K. N., & Venetsanopoulos, A. N. (2011). A survey of multilinear subspace learning for tensor data Pattern Recognition, 44(7), 1540–1551.
Lucey, P., Cohn, J., Prkachin, K., Solomon, P., & Matthews, I. (2011). Painful data: The UNBCMcMaster shoulder pain expression archive database. In Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (pp. 57–64), Santa Barbara, CA.
Mahoor, M., Cadavid, S., Messinger, D., & Cohn, J. (2009). A framework for automated measurement of the intensity of non-posed facial action units. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshop (pp. 74–8), Miami, FL.
Mariooryad, S. & Busso, C. (2013). Analysis and compensation of the reaction lag of evaluators in continuous emotional annotations. In Proceedings of Humaine Association Conference on Affective Computing and Intelligent Interaction (pp. 97–108), Switzerland.
Mavadati, S., Mahoor, M., Bartlett, K., Trinh, P., & Cohn, J. (2013). DISFA: A spontaneous facial action intensity database.IEEE Transactions on Affective Computing, 4(2), 151–160.
McKeown, G., Valstar, M., Cowie, R., Pantic, M., & Schroder, M. (2012). The SEMAINE database: Annotated multimodal records of emotionally colored conversations between a person and a limited agent.IEEE Transactions on Affective Computing, 3(1), 5–17.
Metallinou, A., Katsamanis, A., Wang, Y., & Narayanan, S. (2011). Tracking changes in continuous emotion states using body language and prosodic cues. In Proceedings of IEEE International Conference Acoustics, Speech and Signal Processing (pp. 2288–2291), Prague.
Metallinou, A., Lee, C.-C., Busso, C., Carnicke, S., & Narayanan, S. (2010). The USC CreativeIT database: A multimodal database of theatrical improvisation. In Proceedings of the Multimodal Corpora Workshop: Advances in Capturing, Coding and Analyzing Multimodality (pp. 64–68), Malta.
Nicolaou, M. A., Gunes, H., & Pantic, M. (2010). Automatic segmentation of spontaneous data using dimensional labels from multiple coders. In Proceedings of LREC International Workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Valletta, Malta.
Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space.IEEE Transactions on Affective Computing, 2(2), 92–105.
Nicolaou, M. A., Pavlovic, V., & Pantic, M. (2012). Dynamic probabilistic CCA for analysis of affective behaviour. In Proceedings of the 12th European Conference on Computer Vision (pp. 98–111), Florence, Italy.
Niitsuma, H. & Okada, T. (2005). Covariance and PCA for categorical variables. In T, Ho, D, Cheung, & Liu, H. (Eds), Advances in Knowledge Discovery and Data Mining (pp. 523–528). Berlin: Springer.
Otsuka, T. & Ohya, J. (1997). Recognizing multiple persons' facial expressions using HMMbased on automatic extraction of significant frames from image sequences. In Proceedings of International Conference on Image Processing (pp. 546–549), Santa Barbara, CA.
Padgett, C. & Cottrell, G. W. (1996). Representing face images for emotion classification. In Proceedings 10th Annual Conference on Neural Information Processing Systems (pp. 894–900), Denver, CO.
Pan, S. J. & Yang, Q. (2010). A survey on transfer learning.IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Pantic, M. & Bartlett, M. (2007). Machine analysis of facial expressions. In K, Delac & M, Grgic (Eds), Face Recognition [e-book]. http://www.intechopen.com/books/face_recognition.
Pantic, M. & Patras, I. (2005). Detecting facial actions and their temporal segments in nearly frontal-view face image sequences. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics (pp. 3358–3363), Waikoloa, HI.
Pantic, M. & Patras, I. (2006). Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 36(2), 433–449.
Pantic, M. & Rothkrantz, L. J. (2004). Facial action recognition for facial expression analysis from static face images.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 34(3), 1449–1461.
Pentland, A. (2007). Social signal processing.IEEE Signal Processing Magazine, 24(4), 108–111.
Poppe, R. (2010). A survey on vision-based human action recognition.Image and Vision Computing, 28(6), 976–990.
Posner, J., Russell, J. A., & Peterson, B. S. (2005). The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology.Development and Psychopathology, 17(3), 715–734.
Quinn, A. J. & Bederson, B. B. (2011). Human computation: a survey and taxonomy of a growing field. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Request Permissions (pp. 1403–1412), Vancouver.
Raykar, V. C., Yu, S., Zhao, L. H., et al. (2009). Supervised learning from multiple experts: Whom to trust when everyone lies a bit. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 889–896), Montreal.
Raykar, V. C., Yu, S., Zhao, L. H., et al. (2010). Learning from crowds.Journal of Machine Learning Research, 99, 1297–1322.
Reilly, J., Ghent, J., & McDonald, J. (2006). Investigating the dynamics of facial expression.Lecture Notes in Computer Science, 4292, 334–343.
Rudovic, O., Pavlovic, V., & Pantic, M. (2012a). Kernel conditional ordinal random fields for temporal segmentation of facial action units. Proceedings of 12th European Conference on Computer Vision (pp. 260–269), Florence, Italy.
Rudovic, O., Pavlovic, V., & Pantic, M. (2012b). Multi-output Laplacian dynamic ordinal regression for facial expression recognition and intensity estimation. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 2634–2641), Providence, RI.
Rudovic, O., Pavlovic, V., & Pantic, M. (2013a). Automatic pain intensity estimation with heteroscedastic conditional ordinal random fields. In Proceedings of 9th International Symposium on Advances in Visual Computing (pp. 234–243), Rethymnon, Crete.
Rudovic, O., Pavlovic, V., & Pantic, M. (2013b). Context-sensitive conditional ordinal random fields for facial action intensity estimation. In Proceedings of IEEE International Conference on Computer Vision Workshops (pp. 492–499), Sydney.
Ruta, D. & Gabrys, B. (2005). Classifier selection for majority voting.Information Fusion, 6(1), 63–81.
Savrana, A., Sankur, B., & Bilge, M. (2012). Regression-based intensity estimation of facial action units, Image and Vision Computing, 30(10), 774–784.
Shan, C. (2007). Inferring facial and body language. PhD thesis, University of London.
Shan, C., Gong, S., & McOwan, P. W. (2005). Appearance manifold of facial expression, Lecture Notes in Computer Science, 3766, 221–230.
Shan, C., Gong, S., & McOwan, P. W. (2006). Dynamic facial expression recognition using a Bayesian temporal manifold model. In Proceedings of the British Machine Vision Conference (pp. 297–306), Edinburgh.
Shan, C., Gong, S., & McOwan, P.W. (2009). Facial expression recognition based on local binary patterns: A comprehensive study.Image and Vision Computing, 27(6), 803–816.
Shang, L. & Chan, K.-P. (2009). Nonparametric discriminant HMM and application to facial expression recognition. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2090–2096.
Simon, T., Nguyen, M. H., De la Torre, F., & Cohn, J. F. (2010). Action unit detection with segment-based SVMs. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 2737–2744), San Francisco.
Tian, Y.-L. (2004). Evaluation of face resolution for expression analysis. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Washington, DC.
Tong, Y., Liao, W., & Ji, Q. (2007). Facial action unit recognition by exploiting their dynamic and semantic relationships, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1683–1699.
Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. In O, Maimon & L, Rokach (Eds), Data Mining and Knowledge Discovery Handbook (pp. 667–685). Boston: Springer.
Tucker, L. R. (1958). An inter-battery method of factor analysis.Psychometrika, 23(2), 111–136.
Valstar, M. F. & Pantic, M. (2012). Fully automatic recognition of the temporal phases of facial actions.IEEE Transactions on Systems, Man, and Cybernetics, Part B, 42, 28–43.
Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain.Image and Vision Computing, 27(12), 1743–1759.
Vinciarelli, A., Pantic, M., Heylen, D., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing.IEEE Transactions on Affective Computing, 3(1), 69–87.
Wang, S., Quattoni, A., Morency, L.-P., Demirdjian, D., & Darrell, T. (2006). Hidden conditional random fields for gesture recognition In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 1097–1104), New York.
Wöllmer, M., Eyben, F., Reiter, S., et al. (2008). Abandoning emotion classes – towards continuous emotion recognition with modelling of long-range dependencies. In Proceedings of InterSpeech (pp. 597–600), Brisbane, Australia.
Wright, J., Ma, Y., Mairal, J., et al. (2010). Sparse representation for computer vision and pattern recognition.Proceedings of the IEEE, 98(6), 1031–1044.
Yan, Y., Rosales, R., Fung, G., & Dy, J. (2012). Modeling multiple annotator expertise in the semi-supervised learning scenario. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA
Yang, P., Liu, Q., & Metaxas, D. N. (2009a). Boosting encoded dynamic features for facial expression recognition Pattern Recognition Letters, 2, 132–139.
Yang, P., Liu, Q., & Metaxas, D. N. (2009b). Rankboost with L1 regularization for facial expression recognition and intensity estimation. In Proceedings of IEEE International Conference on Computer Vision (pp. 1018–1025), Kyoto, Japan.
Zhang, Y. & Ji, Q. (2005). Active and dynamic information fusion for facial expression understanding from image sequences.IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(5), 699–714.