Skip to main content Accessibility help
Hostname: page-component-6c8bd87754-qjg4w Total loading time: 0.559 Render date: 2022-01-19T09:05:56.521Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

24 - Social Signal Processing for Surveillance

from Part IV - Applications of Social Signal Processing

Published online by Cambridge University Press:  13 July 2017

Dong Seon Cheng
Hankuk University of Foreign Studies
Marco Cristani
University of Verona
Judee K. Burgoon
University of Arizona
Nadia Magnenat-Thalmann
Université de Genève
Maja Pantic
Imperial College London
Alessandro Vinciarelli
University of Glasgow
Get access


Automated surveillance of human activities has traditionally been a computer vision field interested in the recognition of motion patterns and in the production of high-level descriptions for actions and interactions among entities of interest (Cedras & Shah, 1995; Aggarwal & Cai, 1999; Gavrila, 1999; Moeslund, Hilton, & Krüger, 2006; Buxton, 2003; Hu et al., 2004; Turaga et al., 2008; Dee & Velastin, 2008; Aggarwal & Ryoo, 2011; Borges, Conci, & Cavallaro, 2013). The study on human activities has been revitalized in the last five years by addressing the so-called social signals (Pentland, 2007). In fact, these nonverbal cues inspired by the social, affective, and psychological literature (Vinciarelli, Pantic, & Bourlard, 2009) have allowed a more principled understanding of how humans act and react to other people and to their environment.

Social Signal Processing (SSP) is the scientific field making a systematic, algorithmic and computational analysis of social signals, drawing significant concepts from anthropology and social psychology (Vinciarelli et al., 2009). In particular, SSP does not stop at just modeling human activities, but aims at coding and decoding human behavior. In other words, it focuses on unveiling the underlying hidden states that drive one to act in a distinct way with particular actions. This challenge is supported by decades of investigation in human sciences (psychology, anthropology, sociology, etc.) that showed how humans use nonverbal behavioral cues, like facial expressions, vocalizations (laughter, fillers, back-channel, etc.), gestures, or postures to convey, often outside conscious awareness, their attitude toward other people and social environments, as well as emotions (Richmond & McCroskey, 1995). The understanding of these cues is thus paramount in order to understand the social meaning of human activities.

The formal marriage of automated video surveillance with Social Signal Processing had its programmatic start during SISM 2010 (the International Workshop on Socially Intelligent Surveillance and Monitoring;∼cristanm/ SISM2010/), associated with the IEEE Computer Vision and Pattern Recognition conference. At that venue, the discussion was focused on what kind of social signals can be captured in a generic surveillance scenario, detailing the specific scenarios where the modeling of social aspects could be the most beneficial.

After 2010, SSP hybridizations with surveillance applications have grown rapidly in number and systematic essays about the topic started to compare in the computer vision literature (Cristani et al., 2013).

Social Signal Processing , pp. 331 - 348
Publisher: Cambridge University Press
Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Abbasi, A. & Chen, H. (2008).Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace.ACMTransactions on Information Systems, 26(2), 1–29.Google Scholar
Aggarwal, J. K. & Cai, Q. (1999). Human motion analysis: A review.Computer Vision and Image understanding, 73(3), 428–440.Google Scholar
Aggarwal, J. K. & Ryoo, M. S. (2011). Human activity analysis: A review.ACM Computing Surveys, 43, 1–43.Google Scholar
Ambady, N. & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis.Psychological Bulletin, 111(2), 256–274.Google Scholar
Anderson, R. J. (2001). Security Engineering: A Guide to Building Dependable Distributed Systems. New York: John Wiley & Sons.
Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial structures revisited: People detection and articulated pose estimation. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (pp. 1014–1021).
Ba, S. O. & Odobez, J. M. (2006). A study on visual focus of attention recognition from head pose in a meeting room.Lecture Notes in Computer Science, 4299, 75–87.Google Scholar
Bazzani, L., Cristani, M., & Murino, V. (2012). Decentralized particle filter for joint individualgroup tracking. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1888–1893).
Bazzani, L., Cristani, M., Tosato, D., et al. (2011). Social interactions by visual focus of attention in a three-dimensional environment.Expert Systems, 30(2), 115–127.Google Scholar
Benfold, B. & Reid, I. (2009). Guiding visual surveillance by tracking human attention. In Proceedings of the 20th British Machine Vision Conference, September.
Bolle, R., Connell, J., Pankanti, S., Ratha, N., & Senior, A. (2003). Guide to Biometrics. New York: Springer.
Borges, P. V. K., Conci, N., & Cavallaro, A. (2013). Video-based human behavior understanding: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 23(11), 1993– 2008.Google Scholar
Buxton, H. (2003). Learning and understanding dynamic scene activity: A review.Image and Vision Computing, 21(1), 125–136.Google Scholar
Cassell, J. (1998). A framework for gesture generation and interpretation. In R, Cipolla & A, Pentland (Eds), Computer Vision in Human–Machine Interaction (pp. 191–215). New York: Cambridge University Press.
Cedras, C. & Shah, M. (1995).Motion-based recognition: A survey.Image and Vision Computing, 13(2), 129–155.Google Scholar
Chen, C. & Odobez, J. (2012). We are not contortionists: Coupled adaptive learning for head and body orientation estimation in surveillance video. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1544–1551).
Cristani, M., Bazzani, L., Paggetti, G., et al. (2011). Social interaction discovery by statistical analysis of F-formations. In J, Hoey, S, McKenna, & E, Trucco (Eds), Proceedings of British Machine Vision Conference (pp. 23.1–23.12). Guildford, UK: BMVA Press.
Cristani, M., Paggetti, G., Vinciarelli, A., et al. (2011). Towards computational proxemics: Inferring social relations from interpersonal distances. In Proceedings of Third IEEE International Conference on Social Computing (pp. 290–297).
Cristani, M., Pesarin, A., Vinciarelli, A., Crocco, M., & Murino, V. (2011). Look at who's talking: Voice activity detection by automated gesture analysis. In Proceedings of the Workshop on Interactive Human Behavior Analysis in Open or Public Spaces (InterHub 2011).
Cristani, M., Raghavendra, R., Del Bue, A., & Murino, V. (2013). Human behavior analysis in video surveillance: A social signal processing perspective.Neurocomputing, 100(2), 86–97.Google Scholar
Cristani, M., Roffo, G., Segalin, C., et al. (2012). Conversationally inspired stylometric features for authorship attribution in instant messaging. In Proceedings of the 20th ACM International Conference on Multimedia (pp. 1121–1124).
Curhan, J. R. & Pentland, A. (2007). Thin slices of negotiation: Predicting outcomes from conversational dynamics within the first five minutes.Journal of Applied Psychology, 92(3), 802–811.Google Scholar
Dee, H. M. & Velastin, S. A. (2008). How close are we to solving the problem of automated visual surveillance.Machine Vision and Application, 19(2), 329–343.Google Scholar
Deng, Z., Xu, D., Zhang, X., & Jiang, X. (2012). IntroLib: Efficient and transparent library call introspection for malware forensics. In 12th Annual Digital Forensics Research Conference (pp. 13–23).
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification. New York: John Wiley & Sons.
Ellison, N. B, Steinfield, C., & Lampe, C. (2007). The benefits of Facebook “friends”: Social capital and college students’ use of online social network sites.Journal of Computer-Mediated Communication, 12(4), 1143–1168.Google Scholar
Fuchs, C. (2012). Internet and Surveillance: The Challenges of Web 2.0 and Social Media. New York: Routledge.
Gavrila, D. M. (1999). The visual analysis of human movement: A survey.Computer Vision and Image Understanding, 73(1), 82–98.Google Scholar
Goffman, E. (1966). Behavior in Public Places: Notes on the Social Organization of Gatherings. New York: Free Press.
Groh, G., Lehmann, A., Reimers, J., Friess, M. R., & Schwarz, L. (2010). Detecting social situations from interaction geometry. In Proceedings of the 2010 IEEE Second International Conference on Social Computing (pp. 1–8).
Hall, R. (1966). The Hidden Dimension. Garden City, NY: Doubleday.
Harman, J. P., Hansen, C. E., Cochran, M. E., & Lindsey, C. R. (2005). Liar, liar: Internet faking but not frequency of use affects social skills, self-esteem, social anxiety, and aggression.Cyberpsychology & Behavior, 8(1), 1–6.Google Scholar
Helbing, D., & Molnár, P. (1995). Social force model for pedestrian dynamics.Physical Review E, 51(5), 4282–4287.Google Scholar
Hu, W., Tan, T., Wang, L., & Maybank, S. (2004). A survey on visual surveillance of object motion and behaviors.IEEE Transactions on Systems, Man and Cybernetics, 34, 334–352.Google Scholar
Hung, H., Huang, Y., Yeo, C., & Gatica-Perez, D. (2008). Associating audio-visual activity cues in a dominance estimation framework. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 23–28, Anchorage, AK.
Hung, H., & Kröse, B. (2011). Detecting F-formations as dominant sets. In Proceedings of the International Conference on Multimodal Interaction (pp. 231–238).
Kendon, A. (1990). Conducting Interaction: Patterns of Behavior in Focused Encounters. New York: Cambridge University Press.
Kuncheva, L. I. (2007). A stability index for feature selection. In Proceedings of IASTED International Multi-Conference Artificial Intelligence and Applications (pp. 390–395).
Laptev, I. (2005). On space-time interest points.International Journal of Computer Vision, 64(2–3), 107–123.Google Scholar
Li, Y., Fathi, A., & Rehg, J. M. (2013). Learning to predict gaze in egocentric video. In Proceedings of 14th IEEE International Conference on Computer Vision (pp. 3216–3223).
Lin, W.-C. & Liu, Y. (2007). A lattice-based MRF model for dynamic near-regular texture tracking.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 777–792.Google Scholar
Liu, H. & Motoda, H. (2008). Computational Methods of Feature Selection. Boca Raton, FL: Chapman & Hall/CRC.
Liu, X., Krahnstoever, N., Yu, T., & Tu, P. (2007).What are customers looking at? In Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance (pp. 405–410).
Livingstone, S. & Brake, D. R. (2010). On the rapid rise of social networking sites: New findings and policy implications.Children & Society, 24(1), 75–83.Google Scholar
Lott, D. F. & Sommer, R. (1967). Seating arrangements and status.Journal of Personality and Social Psychology, 7(1), 90–95.Google Scholar
Mauthner, T., Donoser, M., & Bischof, H. (2008). Robust tracking of spatial related components. Proceedings of the International Conference on Pattern Recognition (pp. 1–4).
Moeslund, T. B., Hilton, A., & Krüger, V. (2006). A survey of advances in vision-based human motion capture and analysis.Computer Vision and Image understanding, 104(2), 90–126.Google Scholar
Newman, R. C. (2006). Cybercrime, identity theft, and fraud: Practicing safe Internet – network security threats and vulnerabilities. In Proceedings of the 3rd Annual Conference on Information Security Curriculum Development (pp. 68–78).
Oberschall, A. (1978). Theories of social conflict.Annual Review of Sociology, 4, 291–315.Google Scholar
Oikonomopoulos, A., Patras, I., & Pantic, M. (2011). Spatiotemporal localization and categorization of human actions in unsegmented image sequences.IEEE Transactions on Image Processing, 20(4), 1126–1140.Google Scholar
Orebaugh, A. & Allnutt, J. (2009). Classification of Instant Messaging Communications for Forensics Analysis.International Journal of Forensic Computer Science, 1, 22–28.Google Scholar
Panero, J. & Zelnik, M. (1979). Human Dimension and Interior Space: A Source Book of Design. New York: Whitney Library of Design.
Pang, S. K., Li, J., & Godsill, S. (2007).Models and algorithms for detection and tracking of coordinated groups. In Proceedings of International Symposium on Image and Signal Processing and Analysis (pp. 504–509).
Park, S. & Trivedi, M. M. (2007). Multi-person interaction and activity analysis: A synergistic track- and body-level analysis framework.Machine Vision and Application, 18, 151–166.Google Scholar
Pavan, M. & Pelillo, M. (2007). Dominant sets and pairwise clustering.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1): 167–172.Google Scholar
Pellegrini, S., Ess, A., Schindler, K., & Van Gool, L. (2009). You’ll never walk alone: Modeling social behavior for multi-target tracking. In Proceedings of 12th International Conference on Computer Vision, Kyoto, Japan (pp. 261–268).
Pellegrini, S., Ess, A., & Van Gool, L. (2010). Improving data association by joint modeling of pedestrian trajectories and groupings. In Proceedings of European Conference on Computer Vision (pp. 452–465).
Pentland, A. (2007). Social signal processing.IEEE Signal Processing Magazine, 24(4), 108–111.Google Scholar
Pesarin, A., Cristani, M., Murino, V., & Vinciarelli, A. (2012). Conversation analysis at work: Detection of conflict in competitive discussions through semi-automatic turn-organization analysis.Cognitive Processing, 13(2), 533–540.Google Scholar
Pianesi, F., Mana, N., Ceppelletti, A., Lepri, B., & Zancanaro, M. (2008). Multimodal recognition of personality traits in social interactions. Proceedings of International Conference on Multimodal Interfaces (pp. 53–60).
Popa, M., Koc, A. K., Rothkrantz, L. J. M., Shan, C., & Wiggers, P. (2012). Kinect sensing of shopping related actions. In R, Wichert, K, van Laerhoven, & J, Gelissen (Eds), Constructing Ambient Intelligence (vol. 277, pp. 91–100). Berlin: Springer.
Qin, Z. & Shelton, C. R. (2012). Improving multi-target tracking via social grouping. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1972–1978).
Rajagopalan, S. S., Dhall, A., & Goecke, R. (2013). Self-stimulatory behaviours in the wild for autism diagnosis. In Proceedings of IEEE Workshop on Decoding Subtle Cues from Social Interactions (associated with ICCV 2013) (pp. 755–761).
Richmond, V. & McCroskey, J. (1995). Nonverbal Behaviors in Interpersonal Relations. Boston: Allyn and Bacon.
Robertson, N. M., & Reid, I. D. (2011). Automatic reasoning about causal events in surveillance video.EURASIP Journal on Image and Video Processing, 1, 1–19.Google Scholar
Russo, N. (1967). Connotation of seating arrangements.The Cornell Journal of Social Relations, 2(1), 37–44.Google Scholar
Salamin, H., Favre, S., & Vinciarelli, A. (2009). Automatic role recognition in multiparty recordings: Using social affiliation networks for feature extraction.IEEE Transactions on Multimedia, 11(7), 1373–1380.Google Scholar
Schegloff, E. (2000). Overlapping talk and the organisation of turn-taking for conversation.Language in Society, 29(1), 1–63.Google Scholar
Scovanner, P. & Tappen, M. F. (2009). Learning pedestrian dynamics from the real world. In Proceedings International Conference on Computer Vision (pp. 381–388).
Smith, K., Ba, S., Odobez, J., & Gatica-Perez, D. (2008). Tracking the visual focus of attention for a varying number of wandering people.IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(7), 1–18.Google Scholar
Stiefelhagen, R., Finke, M., Yang, J., & Waibel, A. (1999). From gaze to focus of attention.Lecture Notes in Computer Science, 1614, 761–768.Google Scholar
Stiefelhagen, R., Yang, J., & Waibel, A. (2002). Modeling focus of attention for meeting indexing based on multiple cues.IEEE Transactions on Neural Networks, 13, 928–938.Google Scholar
Tajfel, H. (1982). Social psychology of intergroup relations.Annual Review of Psychology, 33, 1–39.Google Scholar
Tosato, D., Spera, M., Cristani, M., & Murino, V. (2013). Characterizing humans on Riemannian manifolds.IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 2–15.Google Scholar
Turaga, P., Chellappa, R., Subrahmanian, V. S., & Udrea, O. (2008). Machine recognition of human activities: A survey.IEEE Transactions on Circuits and Systems for Video Technology, 18(11), 1473–1488.Google Scholar
Vinciarelli, A., Pantic, M., & Bourlard, H. (2009). Social signal processing: Survey of an emerging domain.Image and Vision Computing Journal, 27(12), 1743–1759.Google Scholar
Yamaguchi, K., Berg, A. C., Ortiz, L. E., & Berg, T. L. (2011). Who are you with and where are you going? In Proceedings of IEEE Conference on Computer Vision and Patter Recognition (pp. 1345–1352).
Yang, Y. & Ramanan, D. (2011). Articulated pose estimation with flexible mixtures-of-parts. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 1385–1392).
Zen, G., Lepri, B., Ricci, E., & Lanz, O. (2010). Space speaks: Towards socially and personality aware visual surveillance. Proceedings of the 1st ACM International Workshop on Multimodal Pervasive Video Analysis (pp. 37–42).
Zhou, L. & Zhang, D. (2004). Can online behavior unveil deceivers? An exploratory investigation of deception in instant messaging. In Proceedings of the Hawaii International Conference on System Sciences(no. 37, p. 22).Google Scholar

Send book to Kindle

To send this book to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.

Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats

Send book to Dropbox

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Dropbox.

Available formats

Send book to Google Drive

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Google Drive.

Available formats