Skip to main content Accessibility help
Hostname: page-component-8bbf57454-lxvzl Total loading time: 0.617 Render date: 2022-01-21T23:54:14.704Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

17 - Social Signal Processing for Automatic Role Recognition

from Part II - Machine Analysis of Social Signals

Published online by Cambridge University Press:  13 July 2017

Alessandro Vinciarelli
University of Glasgow
Judee K. Burgoon
University of Arizona
Nadia Magnenat-Thalmann
Université de Genève
Maja Pantic
Imperial College London
Alessandro Vinciarelli
University of Glasgow
Get access



According to the Oxford Dictionary of Sociology, “Role is a key concept in sociological theory. It highlights the social expectations attached to particular social positions and analyses the workings of such expectations” (Scott & Marshall, 2005). Furthermore, “Role theory concerns one of the most important features of social life, characteristic behaviour patterns or roles” (Biddle, 1986). Besides stating that the notion of role is crucial in sociological inquiry, the definitions introduce the two main elements of role theory, namely expectations and characteristic behaviour patterns. In particular, the definitions suggest that the expectations of others – typically associated to the position someone holds in a given social context – shape roles in terms of stable and recognizable behavioural patterns.

Social signal processing (SSP) relies on the similar key idea that social and psychological phenomena leave physical, machine detectable traces in terms of both verbal (e.g., lexical choices) and nonverbal (prosody, postures, facial expressions, etc.) behavioural cues (Vinciarelli, Pantic, & Bourlard, 2009; Vinciarelli et al., 2012). In particular, most SSP works aim at automatically inferring phenomena like conflict, personality, mimicry, effectiveness of delivery, etc. from verbal and nonverbal behaviour. Hence, given the tight relationship between roles and behavioural patterns, SSP methodologies appear to be particularly suitable to map observable behaviour into roles, i.e. to perform automatic role recognition (ARR). Not surprisingly, ARR was one of the earliest problems addressed in the SSP community and the proposed approaches typically include three main steps, namely person detection (segmentation of raw data streams into segments corresponding to a given individual), behavioural cues extraction (detection and representation of relevant behavioural cues), and role recognition (mapping of detected cues into roles). Most of the works presented in the literature propose experiments over two main types of data, i.e. meeting recordings and broadcast material. The probable reason is that these contexts are naturalistic, but sufficiently constrained to allow effective automatic analysis.

The rest of this chapter is organized as follows: role recognition technology, which introduces the main technological components of an ARR system; previous work, which surveys the most important ARR approaches proposed in the literature; open issues, which outlines the main open issues and challenges of the field; and the last section, which draws some conclusions.

Social Signal Processing , pp. 225 - 233
Publisher: Cambridge University Press
Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Banerjee, S. & Rudnicky, A. I. (2004). Using simple speech based features to detect the state of a meeting and the roles of the meeting participants. In Proceedings of International Conference on Spoken Language Processing (pp. 221–231).
Barzilay, R., Collins, M., Hirschberg, J., & Whittaker, S. (2000). The rules behind the roles: Identifying speaker roles in radio broadcasts. In Proceedings of the 17th National Conference on Artificial Intelligence (pp. 679–684).
Benne, K. D. & Sheats, P. (1948). Functional roles of group members.Journal of Social Issues, 3(2), 41–49.Google Scholar
Biddle, B. J. (1986). Recent developments in role theory.Annual Review of Sociology, 12, 67–92.Google Scholar
Bigot, B., Ferrané, I., Pinquier, J., & André-Obrecht, R. (2010). Speaker role recognition to help spontaneous conversational speech detection. In Proceedings of International Workshop on Searching Spontaneous Conversational Speech (pp. 5–10).
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer.
Dong, W., Lepri, B., Cappelletti, A., et al. (2007 (November). Using the influence model to recognize functional roles in meetings. In Proceedings of the 9th International Conference on Multimodal Interfaces (pp. 271–278).
Forsyth, D. A., Arikan, O., Ikemoto, L., O'Brien, J., & Ramanan, D. (2006). Computational studies of human motion part 1: Tracking and motion synthesis.Foundations and Trends in Computer Graphics and Vision, 1(2), 77–254.Google Scholar
Garg, N., Favre, S., Salamin, H., Hakkani-Tür, D., & Vinciarelli, A. (2008). Role recognition for meeting participants: An approach based on lexical information and social network analysis. In Proceedings of the ACM International Conference on Multimedia (pp. 693–696).
Gatica-Perez, D. (2009). Automatic nonverbal analysis of social interaction in small groups: A review.Image and Vision Computing, 27(12), 1775–1787.Google Scholar
Laskowski, K., Ostendorf, M., & Schultz, T. (2008). Modeling vocal interaction for textindependent participant characterization in multi-party conversation. In Proceedings of the 9th ISCA/ACL SIGdial Workshop on Discourse and Dialogue (pp. 148–155), June.
Liu, Yang. (2006). Initial study on automatic identification of speaker role in broadcast news speech. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers (pp. 81–84), June.
McCowan, I., Carletta, J., Kraaij, W., et al. (2005). The AMI meeting corpus. In Proceedings of the 5th International Conference on Methods and Techniques in Behavioral Research(pp. 137– 140), Wageningen, Netherlands.
Pianesi, F, Zancanaro, M., Lepri, B., & Cappelletti, A. (2008). A multimodal annotated corpus of consensus decision making meetings.Language Resources and Evaluation, 41(3–4), 409–429.Google Scholar
Salamin, H., Favre, S., & Vinciarelli, A. (2009). Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction.IEEE Transactions on Multimedia, 11(7), 1373–1380.Google Scholar
Sapru, A. & Bourlard, H. (2014). Detecting speaker roles and topic changes in multiparty conversations using latent topic models. In Proceedings of InterSpeech (pp. 2882–2886).
Schapire, R. E. & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization.Machine Learning, 39(2/3), 135.Google Scholar
Scott, J. & Marshall, G. (Eds) (2005). Dictionary of Sociology. Oxford: Oxford University Press.
Tranter, S. E. & Reynolds, D. A. (2006). An overview of automatic speaker diarization systems.IEEE Transactions on Audio, Speech, and Language Processing, 14(5), 1557–1565.Google Scholar
Valente, F., Vijayasenan, D., & Motlicek, P. (2011). Speaker diarization of meetings based on speaker role n-gram models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 4416–4419), Prague.
Vinciarelli, A. (2007). Speakers role recognition in multiparty audio recordings using social network analysis and duration distribution modeling.IEEE Transactions on Multimedia, 9(6), 1215–1226.Google Scholar
Vinciarelli, A., Chatziioannou, P., & Esposito, A. (2015).When the words are not everything: The use of laughter, fillers, back-channel, silence and overlapping speech in phone calls. Frontiers in ICT, 2.
Vinciarelli, A. & Favre, S. (2007). Broadcast news story segmentation using social network analysis and hidden Markov models. In Proceedings of the ACM International Conference on Multimedia (pp. 261–264).
Vinciarelli, A., Fernandez, F., & Favre, S. (2007). Semantic segmentation of radio programs using social network analysis and duration distribution modeling. In Proceedings of the IEEE International Conference on Multimedia and Expo (pp. 779–782).
Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain.Image and Vision Computing, 27(12), 1743–1759.Google Scholar
Vinciarelli, A., Pantic, M., Heylen, D., et al. (2012). Bridging the gap between social animal and unsocial machine: A survey of social signal processing.IEEE Transactions on Affective Computing, 3(1), 69–87.Google Scholar
Vinciarelli, A., Salamin, H., & Polychroniou, A. (2014). Negotiating over mobile phones: Calling or being called can make the difference.Cognitive Computation, 6(4), 677–688.Google Scholar
Weng, C. Y., Chu, W. T., & Wu, J. L. (2009). RoleNet: Movie analysis from the perspective of social networks.IEEE Transactions on Multimedia, 11(2), 256–271.Google Scholar
Xu, R. & Wunsch, D. (2005). Survey of clustering algorithms.IEEE Transactions on Neural Networks, 16(3), 645–678.Google Scholar
Yang, M. H., Kriegman, D., & Ahuja, N. (2002). Detecting faces in images: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(1), 34–58.Google Scholar
Zancanaro, M., Lepri, B., & Pianesi, F. (2006). Automatic detection of group functional roles in face to face interactions. In Proceedings of International Conference on Multimodal Interfaces (pp. 47–54).

Send book to Kindle

To send this book to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.

Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats

Send book to Dropbox

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Dropbox.

Available formats

Send book to Google Drive

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Google Drive.

Available formats