A Framework for the Unsupervised and Semi-Supervised Analysis of Visual Frames

Michelle Torres

doi:10.1017/pan.2023.32

A Framework for the Unsupervised and Semi-Supervised Analysis of Visual Frames

Published online by Cambridge University Press: 23 October 2023

Michelle Torres

Show author details

Michelle Torres*: Affiliation:
Assistant Professor, Department of Political Science, University of California, Los Angeles, Los Angeles, CA, USA.
*: Email: smtorres@ucla.edu

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

This article introduces to political science a framework to analyze the content of visual material through unsupervised and semi-supervised methods. It details the implementation of a tool from the computer vision field, the Bag of Visual Words (BoVW), for the definition and extraction of “tokens” that allow researchers to build an Image-Visual Word Matrix which emulates the Document-Term matrix in text analysis. This reduction technique is the basis for several tools familiar to social scientists, such as topic models, that permit exploratory, and semi-supervised analysis of images. The framework has gains in transparency, interpretability, and inclusion of domain knowledge with respect to other deep learning techniques. I illustrate the scope of the BoVW by conducting a novel visual structural topic model which focuses substantively on the identification of visual frames from the pictures of the migrant caravan from Central America.

Keywords

Computational methods visual framing visual structural topic model unstructured data

Type: Article
Information: Political Analysis , Volume 32 , Issue 2 , April 2024 , pp. 199 - 220

DOI: https://doi.org/10.1017/pan.2023.32 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by: Daniel Hopkins

References

Abrajano, M., and Hajnal, Z. L.. 2017. White Backlash: Immigration, Race, and American Politics. Princeton: Princeton University Press.Google Scholar

Arandjelović, R., and Zisserman, A.. 2012. “Three Things Everyone Should Know to Improve Object Retrieval.” In 2012 IEEE Conference on Computer Vision and Pattern Recognition 2911–2918. Providence, RI: IEEE.Google Scholar

Barry, A. M. 1997. Visual Intelligence: Perception, Image, and Manipulation in Visual Communication. Albany: SUNY Press.Google Scholar

Bauer, N. M., and Carpinella, C.. 2018. “Visual Information and Candidate Evaluations: The Influence of Feminine and Masculine Images on Support for Female Candidates.” Political Research Quarterly 71 (2): 395–407.Google Scholar

Bay, H., Tuytelaars, T., and Van Gool, L.. 2006. “Surf: Speeded Up Robust Features.” In European Conference on Computer Vision, 404–417. Berlin–Heidelberg: Springer.Google Scholar

Boussalis, C., Coan, T. G., Holman, M. R., and Müller, S.. 2021. “Gender, Candidate Emotional Expression, and Voter Reactions during Televised Debates.” American Political Science Review 115 (4): 1242–1257.Google Scholar

Canclini, A., Cesana, M., Redondi, A., Tagliasacchi, M., Ascenso, J., and Cilla, R.. 2013. “Evaluation of Low-Complexity Visual Feature Detectors and Descriptors.” In 2013 18th International Conference on Digital Signal Processing (DSP), 1–7. Fira, Greece: IEEE.Google Scholar

Cantú, F. 2019. “The Fingerprints of Fraud: Evidence from Mexico’s 1988 Presidential Election.” American Political Science Review 113 (3): 710–726.Google Scholar

Chong, D., and Druckman, J. N.. 2007. “A Theory of Framing and Opinion Formation in Competitive Elite Environments.” Journal of Communication 57 (1): 99–118.Google Scholar

Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C.. 2004. “Visual Categorization with Bags of Keypoints.” In 8th European Conference on Computer Vision. Vol. 1, 1–2. Prague, Czech Republic: ECCV.Google Scholar

Dafoe, A., Zhang, B., and Caughey, D.. 2018. “Information Equivalence in Survey Experiments.” Political Analysis 26 (4): 399–416.Google Scholar

Dietrich, B. J., Enos, R. D., and Sen, M.. 2019. “Emotional Arousal Predicts Voting on the US Supreme Court.” Political Analysis 27 (2): 237–243.Google Scholar

Druckman, J. N., and Nelson, K. R.. 2003. “Framing and Deliberation: How Citizens’ Conversations Limit Elite Influence.” American Journal of Political Science 47 (4): 729–745.Google Scholar

Earl, J., Martin, A., McCarthy, J. D., and Soule, S. A.. 2004. “The Use of Newspaper Data in the Study of Collective Action.” Annual Review of Sociology 30: 65–80.Google Scholar

Fiske, J., and Hancock, B. H.. 2016. Media Matters: Race & Gender in US Politics. London: Routledge.Google Scholar

Gamson, W. A. 1989. “News as Framing: Comments on Graber.” American Behavioral Scientist 33 (2): 157–161.Google Scholar

Gamson, W. A., and Modigliani, A.. 1989. “Media Discourse and Public Opinion on Nuclear Power: A Constructionist Approach.” American Journal of Sociology 95 (1): 1–37.Google Scholar

Grauman, K., and Darrell, T.. 2005. “The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features.” In Tenth IEEE International Conference on Computer Vision (ICCV ’05) 1458–1465. Beijing, China: IEEE Computer Society.Google Scholar

Grauman, K., and Darrell, T.. 2007. “The Pyramid Match Kernel: Efficient L earning with Sets of Features.” Journal of Machine Learning Research 8 (Apr): 725–760.Google Scholar

Grauman, K., and Leibe, B.. 2011. “Visual Object Recognition.” In Synthesis Lectures on Artificial Intelligence and Machine Learning, Vol. 5, 1–181. Kentfield, CA: Morgan & Claypool Publishers.Google Scholar

Grimmer, J., and Stewart, B. M.. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21 (3): 267–297.Google Scholar

Grün, Felix, Rupprecht, Christian, Navab, Nassir, and Tombari, Federico. 2016. “A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks.” In Proceedings of the 33rd International Conference on Machine Learning. Vol. 48. New York: JMLR: W&CP. Preprint, arXiv:1606.07757.Google Scholar

Hainmueller, J., and Hopkins, D. J.. 2014. “Public Attitudes toward Immigration.” Annual Review of Political Science 17: 225–249.Google Scholar

Hjerm, M. 2007. “Do Numbers Really Count? Group Threat Theory Revisited.” Journal of Ethnic and Migration Studies 33 (8): 1253–1275.Google Scholar

Homola, J., and Tavits, M.. 2018. “Contact Reduces Immigration-Related Fears for Leftist but Not for Rightist Voters.” Comparative Political Studies 51 (13): 1789–1820.Google Scholar

Iyengar, S., and Hahn, K. S.. 2009. “Red Media, Blue Media: Evidence of Ideological Selectivity in Media Use.” Journal of Communication 59 (1): 19–39.Google Scholar

Jürgens, P., Meltzer, C. E., and Scharkow, M.. 2022. “Age and Gender Representation on German TV: A Longitudinal Computational Analysis.” Computational Communication Research 4 (1): 173–207.Google Scholar

Karpathy, A., and Fei-Fei, L.. 2015. “Deep Visual-Semantic Alignments for Generating Image Descriptions.” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 3128–3137. Boston, MA: IEEE.Google Scholar

Knox, D., and Lucas, C.. 2021. “A Dynamic Model of Speech for the Social Sciences.” American Political Science Review 115 (2): 649–666.Google Scholar

Kriesi, H. 1995. New Social Movements in Western Europe: A Comparative Analysis, Vol. 5. Minneapolis: University of Minnesota Press.Google Scholar

Krizhevsky, A., Sutskever, I., and Hinton, G. E.. 2012. “Image Net Classification with Deep Convolutional Neural Networks.” In Advances in Neural Information Processing Systems, Vol. 25, 1097–1105. Lake Tahoe, NV: NIPS.Google Scholar

Lecheler, S., and de Vreese, C. H.. 2013. “What a Difference a Day Makes? The Effects of Repetitive and Competitive News Framing over Time.” Communication Research 40 (2): 147–175.Google Scholar

LeCun, Y. and Bengio, Y.. 1995. “Convolutional Networks for Images, Speech, and Time Series.” In The Handbook of Brain Theory and Neural Networks, edited by M. A. Arbib, 255–258. Cambridge: MIT Press.Google Scholar

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P.. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11): 2278–2324.Google Scholar

Lu, Y., and Pan, J.. 2022. “The Pervasive Presence of Chinese Government Content on Douyin Trending Videos.” Computational Communication Research 4 (1): 68–98.Google Scholar

Mikolajczyk, K., and Schmid, C.. 2005. “A Performance Evaluation of Local Descriptors.” IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (10): 1615–1630.Google Scholar

Neumann, M., Fowler, E. F., and Ridout, T. N.. 2022. “Body Language and Gender Stereotypes in Campaign Video.” Computational Communication Research 4 (1): 254–274.Google Scholar

Oliver, P. E., and Myers, D. J.. 1999. “How Events Enter the Public Sphere: Conflict, Location, and Sponsorship in Local Newspaper Coverage of Public Events.” American Journal of Sociology 105 (1): 38–87.Google Scholar

Parry, K. 2011. “Images of Liberation? Visual Framing, Humanitarianism and British Press Photography during the 2003 Iraq Invasion.” Media, Culture & Society 33 (8): 1185–1201.Google Scholar

Roberts, M. E., et al. 2014. “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58 (4): 1064–1082.Google Scholar

Rosenholtz, R., Li, Y., Mansfield, J., and Jin, Z.. 2005. “Feature Congestion: A Measure of Display Clutter.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 761–770. Portland, OR: ACM.Google Scholar

Simonyan, K., Vedaldi, A., and Zisserman, A.. 2014. “Deep inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps.” In Workshop at the International Conference on Learning Representations. Banff: ICLR.Google Scholar

Sivic, J., Russell, B. C., Efros, A. A., Zisserman, A., and Freeman, W. T.. 2005. “Discovering Objects and Their Location in Images.” In Proceedings of the Tenth IEEE International Conference on Computer Vision. Vol. 1, 370–377. Nice, France: IEEE.Google Scholar

Sivic, J., and Zisserman, A.. 2003. “Video Google: A Text Retrieval Approach to Object Matching in Videos.” In Proceedings of the Ninth IEEE International Conference on Computer Vision. Vol. 2, 1470–1477. Beijing, China: IEEE.Google Scholar

Sniderman, P. M., Hagendoorn, L., and Prior, M.. 2004. “Predisposing Factors and Situational Triggers: Exclusionary Reactions to Immigrant Minorities.” American Political Science Review 98 (1): 35–49.Google Scholar

Torres, M. 2023a. “Replication Data for: A Framework for the Unsupervised Analysis of Images.” https://doi.org/10.24433/CO.1204365.v1Google Scholar

Torres, M. 2023b. “Replication Data for: A Framework for the Unsupervised Analysis of Images.” https://doi.org/10.7910/DVN/PZYLYUGoogle Scholar

Torres, M., and Cantú, F.. 2022. “Learning to See: Convolutional Neural Networks for the Analysis of Social Science Data.” Political Analysis 30 (1): 113–131.Google Scholar

Vigo, D. A. R., Khan, F. S., Van De Weijer, J., and Gevers, T.. 2010. “The Impact of Color on Bag-of-Words Based Object Recognition.” In 2010 20th International Conference on Pattern Recognition, 1549–1553. Istanbul, Turkey: IEEE.Google Scholar

Williams, W., Nora, A. C., and Wilkerson, J. D.. 2020. Images as Data for Social Science Research: An Introduction to Convolutional Neural Nets for Image Classification. Cambridge: Cambridge University Press.Google Scholar

Zeiler, M. D., and Fergus, R.. 2014. “Visualizing and Understanding Convolutional Networks.” In European Conference on Computer Vision, 818–833. Cham: Springer.Google Scholar

Zeiler, M. D., Taylor, G. W., and Fergus, R.. 2011. “Adaptive Deconvolutional Networks for Mid and High Level Feature Learning.” In 2011 International Conference on Computer Vision, 2018–2025. IEEE.Google Scholar

Zhang, H., and Pan, J.. 2019. “CASM: A Deep-Learning Approach for Identifying Collective Action Events with Text and Image Data from Social Media.” Sociological Methodology 49 (1): 1–57.Google Scholar

Torres supplementary material

PDF 12 MB

Torres Dataset

Dataset

https://doi.org/10.7910/DVN/PZYLYU

Link

Article contents

A Framework for the Unsupervised and Semi-Supervised Analysis of Visual Frames

Abstract

Keywords

Access options

Footnotes

References

Torres supplementary material

Torres Dataset

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests