Multimedia Implicit Tagging

doi:10.1017/9781316676202.026

26 - Multimedia Implicit Tagging

from Part IV - Applications of Social Signal Processing

Published online by Cambridge University Press: 13 July 2017

Mohammad Soleymani and

Maja Pantic

Edited by

Judee K. Burgoon ,

Nadia Magnenat-Thalmann ,

Maja Pantic and

Alessandro Vinciarelli

Show author details

Mohammad Soleymani: Affiliation:
University of Geneva
Maja Pantic: Affiliation:
Imperial College London
Judee K. Burgoon: Affiliation:
University of Arizona
Nadia Magnenat-Thalmann: Affiliation:
Université de Genève
Maja Pantic: Affiliation:
Imperial College London
Alessandro Vinciarelli: Affiliation:
University of Glasgow

Book contents

Get access

Summary

Introduction

Social and behavioral signals carry invaluable information regarding how audiences perceive the multimedia content. Assessing the responses from the audience, we can generate tags, summaries, and other forms of metadata for multimedia representation and indexing. Tags are a form of metadata which enables a retrieval system to find and re-find the content of interest (Larson et al., 2011). Unlike classic tagging schemes where users’ direct input is needed, implicit human-centered tagging (IHCT) was proposed (Pantic & Vinciarelli, 2009) to generate tags without any specific input or effort from users. Translating the behavioral responses into tags results in “implicit” tags since there is no need for users’ direct input as reactions to multimedia are displayed spontaneously (Soleymani & Pantic, 2012).

User generated explicit tags are not always assigned with the intention of describing the content and might be given to promote the users themselves (Pantic & Vinciarelli, 2009). Implicit tags have the advantage of being detected for a certain goal relevant to a given application. For example, an online radio interested in the mood of its songs can assess listeners’ emotions; a marketing company is interested in assessing the success of its video advertisements.

It is also worth mentioning that implicit tags can be a complementary source of information in addition to the existing explicit tags. They can also be used to filter out the tags which are not relevant to the content (Soleymani & Pantic, 2013; Soleymani, Kaltwang, & Pantic, 2013). A scheme of implicit tagging versus explicit tagging is shown in Figure 26.1. Recently, we have been witnessing a growing interest from industry on this topic (Klinghult, 2012; McDuff, El Kaliouby, & Picard, 2012; Fleureau, Guillotel, & Orlac, 2013; Silveira et al., 2013) which is a sign of its significance.

Analyzing spontaneous reactions to multimedia content can assist multimedia indexing with the following scenarios: (i) direct translation to tags – users’ spontaneous reactions will be translated into emotions or preference, e.g., interesting, funny, disgusting, scary (Kierkels, Soleymani, & Pun, 2009; Soleymani, Pantic, & Pun, 2012; Petridis & Pantic, 2009; Koelstra et al., 2010; Silveira et al., 2013; Kurdyukova, Hammer, & Andr, 2012);

Type: Chapter
Information: Social Signal Processing , pp. 368 - 378

DOI: https://doi.org/10.1017/9781316676202.026 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abadi, M. K., Kia, S. M., Subramanian, R., Avesani, P., & Sebe, N. (2013). User-centric affective video tagging from MEG and peripheral physiological responses. In Proceedings of 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 582–587).

Abadi, M. K., Staiano, J., Cappelletti, A., Zancanaro, M., & Sebe, N. (2013).Multimodal engagement classification for affective cinema. In Proceedings of 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 411–416).

Arapakis, I., Athanasakos, K., & Jose, J. M. (2010). A comparison of general vs personalised affective models for the prediction of topical relevance. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 371– 378).

Arapakis, I., Konstas, I., & Jose, J. M. (2009). Using facial expressions and peripheral physiological signals as implicit indicators of topical relevance. In Proceedings of the Seventeen ACM International Conference on Multimedia (pp. 461–470).

Arapakis, I., Moshfeghi, Y., Joho, H., et al. (2009). Integrating facial expressions into user profiling for the improvement of a multimodal recommender system. In Proceedings of IEEE International Conference on Multimedia and Expo (pp. 1440–1443).

Auer, P., Hussain, Z., Kaski, S., et al. (2010). Pinview: Implicit feedback in content-based image retrieval. In Proceedings of JMLR: Workshop on Applications of Pattern Analysis (pp. 51–57).

Benini, S., Canini, L., & Leonardi, R. (2011). A connotative space for supporting movie affective recommendation.IEEE Transactions on Multimedia, 13(6), 1356–1370.Google Scholar

Buscher, G., Van Elst, L., & Dengel, A. (2009). Segment-level display time as implicit feedback: A comparison to eye tracking. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 67–74).

Chênes, C., Chanel, G., Soleymani, M., & Pun, T. (2012). Highlight detection in movie scenes through inter-users, physiological linkage. In N, Ramzan, R, van Zwol, J.-S, Lee, K, Clüver, & X.-S., Hua (Eds), Social Media Retrieval (pp. 217–238). Berlin: Springer.

Dietz, R. B. & Lang, A. (1999).Æffective agents: Effects of agent affect on arousal, attention, liking and learning. In Proceedings of the Third International Cognitive Technology Conference, San Francisco.

Eggink, J. & Bland, D. (2012). A large scale experiment for mood-based classification of TV programmes. In Proceedings of IEEE International Conference on Multimedia and Expo (pp. 140–145).

Fleureau, J., Guillotel, P., & Huynh-Thu, Q. (2012). Physiological-based affect event detector for entertainment video applications.IEEE Transactions on Affective Computing, 3(3), 379–385.Google Scholar

Fleureau, J., Guillotel, P., & Orlac, I. (2013). Affective benchmarking of movies based on the physiological responses of a real audience. In Proceedings of 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 73–77).

Goldberg, L. R., Johnson, J. A., Eber, H. W., et al. (2006) The international personality item pool and the future of public-domain personality measures.Journal of Research in Personality, 40(1), 84–96.Google Scholar

Haji Mirza, S., Proulx, M., & Izquierdo, E. (2012). Reading users’ minds from their eyes: A method for implicit image annotation.IEEE Transactions on Multimedia, 14(3), 805–815.Google Scholar

Hanjalic, A & Xu, L.-Q. (2005). Affective video content representation and modeling.IEEE Transactions on Multimedia, 7(1), 143–154.Google Scholar

Hardoon, D. R. & Pasupa, K. (2010). Image ranking with implicit feedback from eye movements. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications (pp. 291– 298).

Jiao, J. & Pantic, M. (2010). Implicit image tagging via facial information. In Proceedings of the 2nd International Workshop on Social Signal Processing (pp. 59–64).

Joachims, T., Granka, L., Pan, B., Hembrooke, H., & Gay, G. (2005). Accurately interpreting clickthrough data as implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 154–161).

Joho, H., Jose, J. M., Valenti, R., & Sebe, N. (2009). Exploiting facial expressions for affective video summarisation. In Proceeding of the ACM International Conference on Image and Video Retrieval, New York.

Joho, H., Staiano, J., Sebe, N., & Jose, J. (2010). Looking at the viewer: Analysing facial activity to detect personal highlights of multimedia contents.Multimedia Tools and Applications, 51(2), 505–523.Google Scholar

Kelly, L. & Jones, G. (2010). Biometric response as a source of query independent scoring in lifelog retrieval. In C, Gurrin, Y, He, G, Kazai, et al. (Eds), Advances in Information Retrieval (vol. 5993, pp. 520–531). Berlin: Springer.

Kierkels, J. J. M., Soleymani, M., & Pun, T. (2009). Queries and tags in affect-based multimedia retrieval. In Proceedings of the 2009 IEEE International Conference on Multimedia and Expo (pp. 1436–1439).

Klinghult, G. (2012). Camera Button with Integrated Sensors. US Patent App. 13/677,517.

Koelstra, S., Muhl, C., & Patras, I. (2009). EEG analysis for implicit tagging of video data. In Proceedings of 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (pp. 1–6).

Koelstra, S., Mühl, C., Soleymani, M., et al. (2012). DEAP: A database for emotion analysis using physiological signals.IEEE Transactions on Affective Computing, 3, 18–31.Google Scholar

Koelstra, S., Yazdani, A., Soleymani, M., et al. (2010). Single trial classification of EEG and peripheral physiological signals for recognition of emotions induced by music videos. In Y, Yao (Ed.), Brain Informatics (vol. 6334, pp. 89–100). Berlin: Springer.

Kurdyukova, E., Hammer, S., & Andr, E. (2012). Personalization of content on public displays driven by the recognition of group context. In F, Patern, B, Ruyter, P, Markopoulos, et al. (Eds), Ambient Intelligence (vol. 7683, pp. 272–287). Berlin: Springer.

Lang, P., Bradley, M., & Cuthbert, B. (2005). international affective picture system (iaps): affective ratings of pictures and instruction manual. Technical report A-8. University of Florida, Gainesville, FL.

Larson, M., Soleymani, M., Serdyukov, P., et al. (2011). Automatic tagging and geotagging in video collections and communities. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval(pp. 51:1–51:8).

McDuff, D., El Kaliouby, R., Demirdjian, D., & Picard, R. (2013) Predicting online media effectiveness based on smile responses gathered over the Internet. In Proceedings of 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (pp. 1– 7).

McDuff, D., El Kaliouby, R., & Picard, R. W. (2012). Crowdsourcing Facial Responses to Online Videos.IEEE Transactions on Affective Computing, 3(4), 456–468.Google Scholar

Moshfeghi, Y. & Jose, J. M. (2013). An effective implicit relevance feedback technique using affective, physiological and behavioural features. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 133–142).

Pantic, M. & Vinciarelli, A. (2009). Implicit human-centered tagging.IEEE Signal Processing Magazine, 26(6), 173–180.Google Scholar

Petridis, S. & Pantic, M. (2009). Is this joke really funny? Judging the mirth by audiovisual laughter analysis. In IEEE International Conference on Multimedia and Expo (pp. 1444–1447).

Salojärvi, J., Puolamäki, K., & Kaski, S. (2005). Implicit relevance feedback from eye movements. In W, Duch, J, Kacprzyk, E, Oja, & S, Zadrozny (Eds), Artificial Neural Networks: Biological Inspirations ICANN 2005 (vol. 3696, pp. 513–518). Berlin: Springer.

Shan, M. K., Kuo, F. F., Chiang, M. F., & Lee, S. Y. (2009). Emotion-based music recommendation by affinity discovery from film music.Expert Systems with Applications, 36(4), 7666–7674.Google Scholar

Shen, X., Tan, B., & Zhai, C. (2005). Context-sensitive information retrieval using implicit feedback. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 43–50).

Silveira, F., Eriksson, B., Sheth, A., & Sheppard, A. (2013). Predicting audience responses to movie content from electro-dermal activity signals. In Proceedings of the 2013 ACM Conference on Ubiquitous Computing.

Soleymani, M., Chanel, G., Kierkels, J. J. M., & Pun, T. (2009). Affective characterization of movie scenes based on content analysis and physiological changes.International Journal of Semantic Computing, 3(2), 235–254.Google Scholar

Soleymani, M., Kaltwang, S., & Pantic, M. (2013). Human behavior sensing for tag relevance assessment. In Proceedings of the 21st ACM International Conference on Multimedia.

Soleymani, M., Koelstra, S., Patras, I., & Pun, T. (2011). Continuous emotion detection in response to music videos. In Proceedings of IEEE International Conference on Automatic Face Gesture Recognition and Workshops (pp. 803–808).

Soleymani, M., Larson, M., Pun, T., & Hanjalic, A. (2014). Corpus development for affective video indexing.IEEE Transactions on Multimedia, 16(4), 1075–1089.Google Scholar

Soleymani, M., Lichtenauer, J., Pun, T., & Pantic, M. (2012). A multimodal database for affect recognition and implicit tagging.IEEE Transactions on Affective Computing, 3, 42–55.Google Scholar

Soleymani, M. & Pantic, M. (2012). Human-centered implicit tagging: Overview and perspectives. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics (pp. 3304–3309).

Soleymani, M. & Pantic, M. (2013). Multimedia implicit tagging using EEG signals. In Proceedings of IEEE International Conference on Multimedia and Expo.

Soleymani, M., Pantic, M., & Pun, T. (2012). Multimodal emotion recognition in response to videos.IEEE Transactions on Affective Computing, 3(2), 211–223.Google Scholar

Tkalcic, M., Burnik, U., & Ko&scaron;ir, A. (2010). Using affective parameters in a content-based recommender system for images.User Modeling and User-Adapted Interaction, 20(4), 279–311.Google Scholar

Tkalcic, M., Odic, A., Ko&scaron;ir, A., & Tasic, J. (2013). Affective labeling in a content-based recommender system for images.IEEE Transactions on Multimedia, 15(2), 391–400.Google Scholar

Tkalcic, M., Tasic, J., & Ko&scaron;ir, A. (2010). The LDOS-PerAff-1 corpus of face video clips with affective and personality metadata. In Proceedings of Multimodal Corpora Advances in Capturing Coding and Analysing Multimodality (pp. 111–115).

Vrochidis, S., Patras, I., & Kompatsiaris, I. (2011). An eye-tracking-based approach to facilitate interactive video search. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval (pp. 43:1–43:8)

Yannakakis, G. N., & Hallam, J. (2011). Ranking vs. preference: A comparative study of selfreporting. In S, D’Mello, A, Graesser, B, Schuller, & J.-C, Martin (Eds), Affective Computing and Intelligent Interaction (vol. 6974, pp. 437–446). Berlin: Springer.

Book contents

26 - Multimedia Implicit Tagging

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive