Tackling challenges of neural purchase stage identification from imbalanced twitter data

Heike Adel; Francine Chen; Yan-Ying Chen

doi:10.1017/S1351324919000433

Tackling challenges of neural purchase stage identification from imbalanced twitter data

Published online by Cambridge University Press: 15 August 2019

Heike Adel

Francine Chen and

Yan-Ying Chen

Show author details

Heike Adel*: Affiliation:
Institute for Natural Language Processing, University of Stuttgart, Pfaffenwaldring 5b, 70569Stuttgart, Germany
Francine Chen: Affiliation:
FX Palo Alto Laboratory, 3174 Porter Dr, Palo Alto, CA94304, USA
Yan-Ying Chen: Affiliation:
FX Palo Alto Laboratory, 3174 Porter Dr, Palo Alto, CA94304, USA
*: *Corresponding author. Email: heike.adel@ims.uni-stuttgart.de

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Twitter and other social media platforms are often used for sharing interest in products. The identification of purchase decision stages, such as in the AIDA model (Awareness, Interest, Desire, and Action), can enable more personalized e-commerce services and a finer-grained targeting of advertisements than predicting purchase intent only. In this paper, we propose and analyze neural models for identifying the purchase stage of single tweets in a user’s tweet sequence. In particular, we identify three challenges of purchase stage identification: imbalanced label distribution with a high number of non-purchase-stage instances, limited amount of training data, and domain adaptation with no or only little target domain data. Our experiments reveal that the imbalanced label distribution is the main challenge for our models. We address it with ranking loss and perform detailed investigations of the performance of our models on the different output classes. In order to improve the generalization of the models and augment the limited amount of training data, we examine the use of sentiment analysis as a complementary, secondary task in a multitask framework. For applying our models to tweets from another product domain, we consider two scenarios: for the first scenario without any labeled data in the target product domain, we show that learning domain-invariant representations with adversarial training is most promising, while for the second scenario with a small number of labeled target examples, fine-tuning the source model weights performs best. Finally, we conduct several analyses, including extracting attention weights and representative phrases for the different purchase stages. The results suggest that the model is learning features indicative of purchase stages and that the confusion errors are sensible.

Keywords

Purchase stage identification Neural networks Social media data

Type: Article
Information: Natural Language Engineering , Volume 26 , Issue 4 , July 2020 , pp. 383 - 411

DOI: https://doi.org/10.1017/S1351324919000433 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adel, H., Chen, F. and Chen, Y. (2017). Ranking convolutional recurrent neural networks for purchase stage identification on imbalanced Twitter data. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain: Association for Computational Linguistics, pp. 592–598.Google Scholar

Adel, H. and Schütze, H. (2017). Exploring different dimensions of attention for uncertainty detection. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain: Association for Computational Linguistics, pp. 22–34.Google Scholar

Asur, S. and Huberman, B.A. (2010). Predicting the future with social media. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, Main Conference Proceedings, Toronto, Canada: IEEE Computer Society, pp. 492–499.CrossRef Google Scholar

Bahdanau, D., Cho, K. and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA.Google Scholar

Benevenuto, F., Magno, G., Rodrigues, T. and Almeida, V. (2010). Detecting spammers on Twitter. In CEAS 2010 - Seventh annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, Redmond, WA, USA: CEAS Conference.Google Scholar

Bollen, J. and Mao, H. (2011). Twitter mood as a stock market predictor. Computer 44(10), 91–94.CrossRef Google Scholar

Caruana, R.A. (1993) Multitask learning: A knowledge-based source of inductive bias. In Proceedings of the 10th International Conference on Machine Learning, ICML, Amherst, MA, USA, pp. 41–48.CrossRef Google Scholar

Chawla, N.V., Japkowicz, N. and Kotcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter 6(1), 1–6.CrossRef Google Scholar

Chen, X., Sun, Y., Athiwaratkun, B., Weinberger, K. and Cardie, C. (2018). Adversarial deep averaging networks for cross-lingual sentiment classification. Transactions of the Association for Computational Linguistics 6, 557–570.CrossRef Google Scholar

Chen, X., Tan, T., Liu, X., Lanchantin, P., Wan, M., Gales, M.J.F. and Woodland, P.C. (2015). Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, pp. 3511–3515.Google Scholar

Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 1724–1734.CrossRef Google Scholar

Chung, J., Gulcehre, C., Cho, K. and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada.Google Scholar

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493–2537.Google Scholar

Davenport, T.H., Dalle Mule, L. and Lucker, J. (2011). Know what your customers want before they do. Harvard Business Review 89, 84–92.Google Scholar

Ding, X., Liu, T., Duan, J. and Nie, J. (2015). Mining user consumption intention from social media using domain adaptive convolutional neural network. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, pp. 2389–2395.Google Scholar

Dos Santos, C., Xiang, B. and Zhou, B. (2015). Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China: Association for Computational Linguistics, pp. 626–634.Google Scholar

Dukesmith, F.H. (1904). Three natural fields of salesmanship. Salesmanship 2(1), 14.Google Scholar

Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M. and Lempitsky, V. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research 17(1), 2096–2130.Google Scholar

Gao, J., Pantel, P., Gamon, M., He, X. and Deng, L. (2014). Modeling interestingness with deep neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 2–13.CrossRef Google Scholar

Godin, F., Vandersmissen, B., de Neve, W. and van de Walle, R. (2015). Multimedia lab @ ACL WNUT NER shared task: Named entity recognition for Twitter microposts using distributed word representations. In Proceedings of the Workshop on Noisy User-generated Text, Beijing, China: Association for Computational Linguistics, pp. 146–153.CrossRef Google Scholar

Gui, T., Zhang, Q., Huang, H., Peng, M. and Huang, X. (2017). Part-of-speech tagging for Twitter with adversarial neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark: Association for Computational Linguistics, pp. 2411–2420.Google Scholar

Gupta, V., Varshney, D., Jhamtani, H., Kedia, D. and Karwa, S. 2014. Identifying purchase intent from social posts. In Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan.Google Scholar

Hermann, K.M., Kociský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M. and Blunsom, P. (2015). Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, pp. 1693–701.Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 1735–1780.CrossRef Google Scholar PubMed

Hollerit, B., Kröll, M. and Strohmaier, M. (2013). Towards linking buyers and sellers: Detecting commercial intent on Twitter. In 22nd International World Wide Web Conference, WWW 2013, Companion Volume, Rio de Janeiro, Brazil, pp. 629–632.CrossRef Google Scholar

Kalchbrenner, N., Grefenstette, E. and Blunsom, P. (2014). A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA: Association for Computational Linguistics, pp. 655–665.CrossRef Google Scholar

Kharratzadeh, M. and Coates, M. (2012). Weblog analysis for predicting correlations in stock price evolutions. In Proceedings of the Sixth International Conference on Weblogs and Social Media, Dublin, Ireland.Google Scholar

Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 1746–1751.CrossRef Google Scholar

Klerke, S., Goldberg, Y. and Søgaard, A. (2016). Improving sentence compression by learning to predict gaze. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA: Association for Computational Linguistics, pp. 1528–1533.Google Scholar

Kombrink, S., Mikolov, T., Karafiát, M. and Burget, L. (2011). Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp. 2877–2780.Google Scholar

Korpusik, M., Sakaki, S., Chen, F. and Chen, Y. (2016). Recurrent neural networks for customer purchase prediction on Twitter. In Proceedings of the 3rd Workshop on New Trends in Content-Based Recommender Systems co-located with ACM Conference on Recommender Systems (RecSys 2016), Boston, MA, USA, pp. 47–50.Google Scholar

Lassen, N. B., Madsen, R. and Vatrapu, R. (2014). Predicting iphone sales from iphone tweets. In 18th IEEE International Enterprise Distributed Object Computing Conference, EDOC 2014, Ulm, Germany, pp. 81–90.CrossRef Google Scholar

Le, Q.V. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, ICML, Beijing, China, pp. 1188–1196.Google Scholar

Lebret, R., Pinheiro, P. and Collobert, R. (2015). Phrase-based image captioning. In Proceedings of the 32th International Conference on Machine Learning, ICML, Lille, France, pp. 2085–2094.Google Scholar

Lewis, E.S.E. (1903). Catch-line and argument. The Book-Keeper 15, 124–128.Google Scholar

Lo, C., Frankowski, D. and Leskovec, J. (2016). Understanding behaviors that lead to purchasing: A case study of pinterest. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 531–540.CrossRef Google Scholar

Lv, H., Yu, G., Tian, X. and Wu, G. (2014). Deep learning-based target customer position extraction on social network. In International Conference on Management Science & Engineering (ICMSE). IEEE, pp. 590–595.Google Scholar

Mahmud, J., Fei, G., Xu, A., Pal, A. and Zhou, M.X. (2016). Predicting attitude and actions of Twitter users. In Proceedings of the 21st International Conference on Intelligent User Interfaces, IUI 2016, Sonoma, CA, USA, pp. 2–6.CrossRef Google Scholar

Mikolov, T. (2012). Statistical language models based on neural networks. PhD thesis. Brno University of Technology.Google Scholar

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of Workshop at 1st International Conference on Learning Representations (ICLR), Scottsdale, AZ, USA.Google Scholar

Morris, M.R., Teevan, J. and Panovich, K. (2010). What do people ask their social networks, and why? A survey study of status message Q&A behavior. In Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, Atlanta, GA, USA, pp. 1739–1748.Google Scholar

Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F. and Stoyanov, V. (2016). SemEval-2016 task 4: Sentiment analysis in Twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA: Association for Computational Linguistics, pp. 1–18.Google Scholar

Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N. and Smith, N.A. (2013). Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA: Association for Computational Linguistics, pp. 380–390.Google Scholar

Pascanu, R., Mikolov, T. and Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, Atlanta, GA, USA: JMLR.org.Google Scholar

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.Google Scholar

Ramanand, J., Bhavsar, K. and Pedanekar, N. (2010). Wishful thinking - finding suggestions and “buy’ wishes from product reviews. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA, USA: Association for Computational Linguistics, pp. 54–61.Google Scholar

Ribeiro, M.T., Singh, S. and Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA: ACM, pp. 1135–1144.CrossRef Google Scholar

Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.Google Scholar

Russell, C. (1921). How to write a sales-making letter. Printers’ Ink 115, 49–56.Google Scholar

Sakaki, S., Chen, F., Korpusik, M. and Chen, Y. (2016). Corpus for customer purchase behavior prediction in social media. In Language Resources and Evaluation Conference, Portorož, Slovenia, pp. 2976–2980.Google Scholar

Schuster, M., Paliwal, K. K. and General, A. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11), 2673–2681.CrossRef Google Scholar

Tang, D., Qin, B. and Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics, pp. 1422–1432.CrossRef Google Scholar

Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical expressions. In arXiv:1605.02688.Google Scholar

Tzeng, E., Hoffman, J., Darrell, T. and Saenko, K. (2017). Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition, Honolulu, HI, USA.CrossRef Google Scholar

Vieira, A. (2015). Predicting online user behaviour using deep learning algorithms. In arXiv:1511.06247.Google Scholar

Werbos, P.J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78(10), 1550–1560.CrossRef Google Scholar

Weston, J., Bengio, S. and Usunier, N. (2011). WSABIE: Scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain: AAAI Press, pp. 2764–2770.Google Scholar

Wijaya, B.S. (2015). The development of hierarchy of effects model in advertising. International Research Journal of Business Studies 5(1).Google Scholar

Xu, S., Liang, H. and Baldwin, T. (2016). UNIMELB at SemEval-2016 tasks 4a and 4b: An ensemble of neural networks and a word2vec based model for sentiment classification. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA: Association for Computational Linguistics, pp. 183–189.Google Scholar

Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J. and Li, Z. (2017). Building task-oriented dialogue systems for online shopping. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, pp. 4618–4626.Google Scholar

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A. and Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA: Association for Computational Linguistics, pp. 1480–1489.Google Scholar

Zeiler, M.D. (2012). ADADELTA: An adaptive learning rate method. In arXiv:1212.5701.Google Scholar

Article contents

Tackling challenges of neural purchase stage identification from imbalanced twitter data

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests