Skip to main content Accessibility help
×
Home

Using linguistically defined specific details to detect deception across domains

  • Nikolai Vogler (a1) and Lisa Pearl (a1)

Abstract

Current automatic deception detection approaches tend to rely on cues that are based either on specific lexical items or on linguistically abstract features that are not necessarily motivated by the psychology of deception. Notably, while approaches relying on such features can do well when the content domain is similar for training and testing, they suffer when content changes occur. We investigate new linguistically defined features that aim to capture specific details, a psychologically motivated aspect of truthful versus deceptive language that may be diagnostic across content domains. To ascertain the potential utility of these features, we evaluate them on data sets representing a broad sample of deceptive language, including hotel reviews, opinions about emotionally charged topics, and answers to job interview questions. We additionally evaluate these features as part of a deception detection classifier. We find that these linguistically defined specific detail features are most useful for cross-domain deception detection when the training data differ significantly in content from the test data, and particularly benefit classification accuracy on deceptive documents. We discuss implications of our results for general-purpose approaches to deception detection.

Copyright

Corresponding author

*Corresponding author. Email: nikolai.vogler@gmail.com

References

Hide All
Almela, Á., Valencia-García, R. and Cantos, P. (2012). Seeing through deception: A computational approach to deceit detection in written communication. In Proceedings of the ACL Workshop on Computational Approaches to Deception Detection, pp. 1522.
Bachenko, J., Fitzpatrick, E. and Schonwetter, M. (2008). Verification and implementation of language-based deception indicators in civil and criminal narratives. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING)-Volume 1, Stroudsburg, PA: Association for Computational Linguistics. pp. 4148.
Burgoon, J., Mayew, W.J., Giboney, J.S., Elkins, A.C., Moffitt, K., Dorn, B., Byrd, M. and Spitzley, L. (2016). Which spoken language markers identify deception in high-stakes settings? Evidence from earnings conference calls. Journal of Language and Social Psychology 35(2), 123157.
Burgoon, J.K., Blair, J.P., Qin, T. and Nunamaker, J.F. (2003). Detecting deception through linguistic analysis. In International Conference on Intelligence and Security Informatics, pp. 91101.
Burgoon, J.K., Buller, D.B., Guerrero, L.K., Afifi, W.A. and Feldman, C.M. (1996). Interpersonal deception: XII. Information management dimensions underlying deceptive and truthful messages. Communications Monographs 63(1), 5069.
Burgoon, J.K., Buller, D.B., White, C.H., Afifi, W. and Buslig, A.L.S. (1999). The role of conversational involvement in deceptive interpersonal interactions. Personality and Social Psychology Bulletin 25(6), 669686.
Burgoon, J.K. and Qin, T. (2006). The dynamic nature of deceptive verbal communication. Journal of Language and Social Psychology 25(1), 7696.
Feng, S., Banerjee, R. and Choi, Y. (2012). Syntactic stylometry for deception detection. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, pp. 171175.
Feng, V.W. and Hirst, G. (2013). Detecting deceptive opinions with profile compatibility. In International Joint Conference on Natural Language Processing, pp. 338346.
Fitzpatrick, E. and Bachenko, J. (2009). Building a forensic corpus to test language-based indicators of deception. Language and Computers 71(1), 183196.
Fitzpatrick, E., Bachenko, J. and Fornaciari, T. (2015). Automatic detection of verbal deception. Synthesis Lectures on Human Language Technologies 8(3), 1119.
Fornaciari, T. and Poesio, M. (2011). Lexical vs. surface features in deceptive language analysis. In Proceedings of the ICAIL 2011 Workshop: Applying Human Language Technology to the Law, pp. 28.
Fornaciari, T. and Poesio, M. (2013). Automatic deception detection in Italian court cases. Artificial Intelligence and Law 21(3), 303340.
Fornaciari, T. and Poesio, M. (2014). Identifying fake Amazon reviews as learning from crowds. In Proceedings of the Association for Computational Linguistics, pp. 279287.
Fusilier, D.H., Montes-y-Gómez, M., Rosso, P. and Cabrera, R.G. (2015). Detecting positive and negative deceptive opinions using PU-learning. Information Processing & Management 51(4), 433443.
Graham, Y., Mathur, N. and Baldwin, T. (2014). Randomized significance tests in machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 266274.
Hirschberg, J., Benus, S., Brenier, J., Enos, F., Hoffman, S., Gilman, S., Girand, C., Graciarena, M., Kathol, A., Michaelis, L., Pellom, L.B, Shriberg, E. and Stolcke, A. (2005). Distinguishing deceptive from non-deceptive speech. In 9th European Conference on Speech Communication and Technology, pp. 1833–1836.
Johnson, M.K. and Raye, C.L. (1981). Reality monitoring. Psychological Review 88(1), 67.
Kim, S., Lee, S., Park, D. and Kang, J. (2017). Constructing and evaluating a novel crowdsourcing-based paraphrased opinion spam dataset. In Proceedings of the 26th International Conference on World Wide Web, pp. 827836.
Klein, D. and Manning, C.D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 423430.
Kleinberg, B., Mozes, M., Arntz, A. and Verschuere, B. (2017). Using named entities for computer-automated verbal deception detection. Journal of Forensic Sciences 63(3), 714723.
Krüger, K., Lukowiak, A., Sonntag, J., Warzecha, S. & Stede, M. (2017). Classifying news versus opinions in newspapers: Linguistic features for domain independence. Natural Language Engineering 23(5), 687707.
Larcker, D.F. and Zakolyukina, A.A. (2012). Detecting deceptive discussions in conference calls. Journal of Accounting Research 50(2), 495540.
Levine, T.R. (2014). Truth-Default Theory (TDT) a theory of human deception and deception detection. Journal of Language and Social Psychology 33(4), 378392.
Li, J., Ott, M., Cardie, C. and Hovy, E. (2014). Towards a general rule for identifying deceptive opinion spam. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 15661576.
Mayzlin, D., Dover, Y. and Chevalier, J. (2014). Promotional reviews: An empirical investigation of online review manipulation. American Economic Review 104(8), 24212455.
McCornack, S.A. (1992). Information manipulation theory. Communications Monographs 59(1), 116.
McCornack, S.A. and Parks, M.R. (1986). Deception detection and relationship development: The other side of trust. Annals of the International Communication Association 9(1), 377389.
Mihalcea, R. and Strapparava, C. (2009). The lie detector: Explorations in the automatic recognition of deceptive language. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 309312.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 31113119. https://www.cambridge.org/core/journals/natural-language-engineering/article/word2vec/B84AE4446BD47F48847B4904F0B36E0B
Narayan, R., Rout, J.K. and Jena, S.K. (2018). Review spam detection using opinion mining. In Progress in Intelligent Computing Techniques: Theory, Practice, and Applications, pp. 273279.
Newman, M.L., Pennebaker, J.W., Berry, D.S. and Richards, J.M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin 29(5), 665675.
Ott, M., Cardie, C. and Hancock, J. (2012). Estimating the prevalence of deception in online review communities. In Proceedings of the 21st International Conference on World Wide Web, pp. 201210.
Ott, M., Cardie, C. and Hancock, J.T. (2013). Negative deceptive opinion spam. In HLT-NAACL, pp. 497501.
Ott, M., Choi, Y., Cardie, C. and Hancock, J.T. (2011). Finding deceptive opinion spam by any stretch of the imagination. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 309319.
Pearl, L., Lu, K. and Haghighi, A. (2016). The character in the letter: Epistolary attribution in Samuel Richardsons Clarissa. Digital Scholarship in the Humanities 32(2), 355376.
Pearl, L. and Steyvers, M. (2012). Detecting authorship deception: A supervised machine learning approach using author writeprints. Literary and Linguistic Computing 27(2), 183196.
Pearl, L.S. and Enverga, I. (2014). Can you read my mindprint?: Automatically identifying mental states from language text using deeper linguistic features. Interaction Studies 15(3), 359387.
Pennebaker, J., Booth, R. and Francis, M. (2007). Linguistic Inquiry and Word Count: LIWC. Austin, TX: LIWC.net.
Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 15321543.
Pérez-Rosas, V. and Mihalcea, R. (2015). Experiments in open domain deception detection. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 11201125.
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint https://arXiv.org/abs/1802.05365arXiv:1802.05365.
Plank, B. and Van Noord, G. (2011). Effective measures of domain similarity for parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pp. 15661576.
Remus, R. (2012). Domain adaptation using domain similarity and domain complexity-based instance selection for cross-domain sentiment analysis. In 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 717723.
Rosso, P. and Cagnina, L.C. (2017). Deception detection and opinion spam. In Cambria, E., Das, D., Bandyopadhyay, S. and Feraco, A. (eds.), A Practical Guide to Sentiment Analysis. Socio-Affective Computing, vol 5. Cham: Springer, p. 155171.
Rubin, V.L. and Vashchilko, T. (2012). Identification of truth and deception in text: Application of vector space model to rhetorical structure theory. In Proceedings of the Workshop on Computational Approaches to Deception Detection, pp. 97106.
Ruder, S., Ghaffari, P. and Breslin, J.G. (2017). Data selection strategies for multi-domain sentiment analysis. arXiv preprint https://arXiv.org/abs/1702.02426arXiv:1702.02426.
Santos, E. and Li, D. (2010). On deception detection in multiagent systems. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 40(2), 224235.
Steller, M. and Koehnken, G. (1989). Criteria-based statement analysis.
Vrij, A. (2000). Detecting Lies and Deceit: The Psychology of Lying and Implications for Professional Practice. New York: Wiley.
Vrij, A. (2008). Detecting lies and deceit: Pitfalls and opportunities. New York: Wiley.
Yancheva, M. and Rudzicz, F. (2013). Automatic detection of deception in child-produced speech using syntactic complexity features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 944953.
Yoo, K.-H. and Gretzel, U. (2009). Comparison of deceptive and truthful travel reviews. In: Höpken, W., Gretzel, U. and Law, R. (eds.) Information and Communication Technologies in Tourism, Vienna: Springer, pp. 3747.
Yu, D., Tyshchuk, Y., Ji, H. and Wallace, W. (2015). Detecting deceptive groups using conversations and network analysis. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 857866.
Zhou, L., Burgoon, J.K., Nunamaker, J.F. and Twitchell, D. (2004). Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications. Group Decision and Negotiation 13(1), 81106.
Zuckerman, M., DePaulo, B.M. and Rosenthal, R. (1981). Verbal and nonverbal communication of deception. Advances in Experimental Social Psychology 14, 159.

Keywords

Using linguistically defined specific details to detect deception across domains

  • Nikolai Vogler (a1) and Lisa Pearl (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed