Hostname: page-component-8448b6f56d-c4f8m Total loading time: 0 Render date: 2024-04-25T00:59:53.391Z Has data issue: false hasContentIssue false

Tackling challenges of neural purchase stage identification from imbalanced twitter data

Published online by Cambridge University Press:  15 August 2019

Heike Adel*
Affiliation:
Institute for Natural Language Processing, University of Stuttgart, Pfaffenwaldring 5b, 70569Stuttgart, Germany
Francine Chen
Affiliation:
FX Palo Alto Laboratory, 3174 Porter Dr, Palo Alto, CA94304, USA
Yan-Ying Chen
Affiliation:
FX Palo Alto Laboratory, 3174 Porter Dr, Palo Alto, CA94304, USA
*
*Corresponding author. Email: heike.adel@ims.uni-stuttgart.de

Abstract

Twitter and other social media platforms are often used for sharing interest in products. The identification of purchase decision stages, such as in the AIDA model (Awareness, Interest, Desire, and Action), can enable more personalized e-commerce services and a finer-grained targeting of advertisements than predicting purchase intent only. In this paper, we propose and analyze neural models for identifying the purchase stage of single tweets in a user’s tweet sequence. In particular, we identify three challenges of purchase stage identification: imbalanced label distribution with a high number of non-purchase-stage instances, limited amount of training data, and domain adaptation with no or only little target domain data. Our experiments reveal that the imbalanced label distribution is the main challenge for our models. We address it with ranking loss and perform detailed investigations of the performance of our models on the different output classes. In order to improve the generalization of the models and augment the limited amount of training data, we examine the use of sentiment analysis as a complementary, secondary task in a multitask framework. For applying our models to tweets from another product domain, we consider two scenarios: for the first scenario without any labeled data in the target product domain, we show that learning domain-invariant representations with adversarial training is most promising, while for the second scenario with a small number of labeled target examples, fine-tuning the source model weights performs best. Finally, we conduct several analyses, including extracting attention weights and representative phrases for the different purchase stages. The results suggest that the model is learning features indicative of purchase stages and that the confusion errors are sensible.

Type
Article
Copyright
© Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adel, H., Chen, F. and Chen, Y. (2017). Ranking convolutional recurrent neural networks for purchase stage identification on imbalanced Twitter data. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain: Association for Computational Linguistics, pp. 592598.Google Scholar
Adel, H. and Schütze, H. (2017). Exploring different dimensions of attention for uncertainty detection. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Valencia, Spain: Association for Computational Linguistics, pp. 2234.Google Scholar
Asur, S. and Huberman, B.A. (2010). Predicting the future with social media. In 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2010, Main Conference Proceedings, Toronto, Canada: IEEE Computer Society, pp. 492499.CrossRefGoogle Scholar
Bahdanau, D., Cho, K. and Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR), San Diego, CA, USA.Google Scholar
Benevenuto, F., Magno, G., Rodrigues, T. and Almeida, V. (2010). Detecting spammers on Twitter. In CEAS 2010 - Seventh annual Collaboration, Electronic messaging, Anti-Abuse and Spam Conference, Redmond, WA, USA: CEAS Conference.Google Scholar
Bollen, J. and Mao, H. (2011). Twitter mood as a stock market predictor. Computer 44(10), 9194.CrossRefGoogle Scholar
Caruana, R.A. (1993) Multitask learning: A knowledge-based source of inductive bias. In Proceedings of the 10th International Conference on Machine Learning, ICML, Amherst, MA, USA, pp. 4148.CrossRefGoogle Scholar
Chawla, N.V., Japkowicz, N. and Kotcz, A. (2004). Editorial: Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter 6(1), 16.CrossRefGoogle Scholar
Chen, X., Sun, Y., Athiwaratkun, B., Weinberger, K. and Cardie, C. (2018). Adversarial deep averaging networks for cross-lingual sentiment classification. Transactions of the Association for Computational Linguistics 6, 557570.CrossRefGoogle Scholar
Chen, X., Tan, T., Liu, X., Lanchantin, P., Wan, M., Gales, M.J.F. and Woodland, P.C. (2015). Recurrent neural network language model adaptation for multi-genre broadcast speech recognition. In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, pp. 35113515.Google Scholar
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 17241734.CrossRefGoogle Scholar
Chung, J., Gulcehre, C., Cho, K. and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, Montreal, QC, Canada.Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 24932537.Google Scholar
Davenport, T.H., Dalle Mule, L. and Lucker, J. (2011). Know what your customers want before they do. Harvard Business Review 89, 8492.Google Scholar
Ding, X., Liu, T., Duan, J. and Nie, J. (2015). Mining user consumption intention from social media using domain adaptive convolutional neural network. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, Austin, TX, USA, pp. 23892395.Google Scholar
Dos Santos, C., Xiang, B. and Zhou, B. (2015). Classifying relations by ranking with convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China: Association for Computational Linguistics, pp. 626634.Google Scholar
Dukesmith, F.H. (1904). Three natural fields of salesmanship. Salesmanship 2(1), 14.Google Scholar
Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M. and Lempitsky, V. (2016). Domain-adversarial training of neural networks. Journal of Machine Learning Research 17(1), 20962130.Google Scholar
Gao, J., Pantel, P., Gamon, M., He, X. and Deng, L. (2014). Modeling interestingness with deep neural networks. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 213.CrossRefGoogle Scholar
Godin, F., Vandersmissen, B., de Neve, W. and van de Walle, R. (2015). Multimedia lab @ ACL WNUT NER shared task: Named entity recognition for Twitter microposts using distributed word representations. In Proceedings of the Workshop on Noisy User-generated Text, Beijing, China: Association for Computational Linguistics, pp. 146153.CrossRefGoogle Scholar
Gui, T., Zhang, Q., Huang, H., Peng, M. and Huang, X. (2017). Part-of-speech tagging for Twitter with adversarial neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark: Association for Computational Linguistics, pp. 24112420.Google Scholar
Gupta, V., Varshney, D., Jhamtani, H., Kedia, D. and Karwa, S. 2014. Identifying purchase intent from social posts. In Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan.Google Scholar
Hermann, K.M., Kociský, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M. and Blunsom, P. (2015). Teaching machines to read and comprehend. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, Montreal, QC, Canada, pp. 1693–701.Google Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 17351780.CrossRefGoogle ScholarPubMed
Hollerit, B., Kröll, M. and Strohmaier, M. (2013). Towards linking buyers and sellers: Detecting commercial intent on Twitter. In 22nd International World Wide Web Conference, WWW 2013, Companion Volume, Rio de Janeiro, Brazil, pp. 629632.CrossRefGoogle Scholar
Kalchbrenner, N., Grefenstette, E. and Blunsom, P. (2014). A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA: Association for Computational Linguistics, pp. 655665.CrossRefGoogle Scholar
Kharratzadeh, M. and Coates, M. (2012). Weblog analysis for predicting correlations in stock price evolutions. In Proceedings of the Sixth International Conference on Weblogs and Social Media, Dublin, Ireland.Google Scholar
Kim, Y. (2014). Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar: Association for Computational Linguistics, pp. 17461751.CrossRefGoogle Scholar
Klerke, S., Goldberg, Y. and Søgaard, A. (2016). Improving sentence compression by learning to predict gaze. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA: Association for Computational Linguistics, pp. 15281533.Google Scholar
Kombrink, S., Mikolov, T., Karafiát, M. and Burget, L. (2011). Recurrent neural network based language modeling in meeting recognition. In INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, pp. 2877–2780.Google Scholar
Korpusik, M., Sakaki, S., Chen, F. and Chen, Y. (2016). Recurrent neural networks for customer purchase prediction on Twitter. In Proceedings of the 3rd Workshop on New Trends in Content-Based Recommender Systems co-located with ACM Conference on Recommender Systems (RecSys 2016), Boston, MA, USA, pp. 4750.Google Scholar
Lassen, N. B., Madsen, R. and Vatrapu, R. (2014). Predicting iphone sales from iphone tweets. In 18th IEEE International Enterprise Distributed Object Computing Conference, EDOC 2014, Ulm, Germany, pp. 8190.CrossRefGoogle Scholar
Le, Q.V. and Mikolov, T. (2014). Distributed representations of sentences and documents. In Proceedings of the 31th International Conference on Machine Learning, ICML, Beijing, China, pp. 11881196.Google Scholar
Lebret, R., Pinheiro, P. and Collobert, R. (2015). Phrase-based image captioning. In Proceedings of the 32th International Conference on Machine Learning, ICML, Lille, France, pp. 20852094.Google Scholar
Lewis, E.S.E. (1903). Catch-line and argument. The Book-Keeper 15, 124128.Google Scholar
Lo, C., Frankowski, D. and Leskovec, J. (2016). Understanding behaviors that lead to purchasing: A case study of pinterest. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 531540.CrossRefGoogle Scholar
Lv, H., Yu, G., Tian, X. and Wu, G. (2014). Deep learning-based target customer position extraction on social network. In International Conference on Management Science & Engineering (ICMSE). IEEE, pp. 590595.Google Scholar
Mahmud, J., Fei, G., Xu, A., Pal, A. and Zhou, M.X. (2016). Predicting attitude and actions of Twitter users. In Proceedings of the 21st International Conference on Intelligent User Interfaces, IUI 2016, Sonoma, CA, USA, pp. 26.CrossRefGoogle Scholar
Mikolov, T. (2012). Statistical language models based on neural networks. PhD thesis. Brno University of Technology.Google Scholar
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. In Proceedings of Workshop at 1st International Conference on Learning Representations (ICLR), Scottsdale, AZ, USA.Google Scholar
Morris, M.R., Teevan, J. and Panovich, K. (2010). What do people ask their social networks, and why? A survey study of status message Q&A behavior. In Proceedings of the 28th International Conference on Human Factors in Computing Systems, CHI 2010, Atlanta, GA, USA, pp. 17391748.Google Scholar
Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F. and Stoyanov, V. (2016). SemEval-2016 task 4: Sentiment analysis in Twitter. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA: Association for Computational Linguistics, pp. 118.Google Scholar
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N. and Smith, N.A. (2013). Improved part-of-speech tagging for online conversational text with word clusters. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA: Association for Computational Linguistics, pp. 380390.Google Scholar
Pascanu, R., Mikolov, T. and Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, Atlanta, GA, USA: JMLR.org.Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 28252830.Google Scholar
Ramanand, J., Bhavsar, K. and Pedanekar, N. (2010). Wishful thinking - finding suggestions and “buy’ wishes from product reviews. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text, Los Angeles, CA, USA: Association for Computational Linguistics, pp. 5461.Google Scholar
Ribeiro, M.T., Singh, S. and Guestrin, C. (2016). Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA: ACM, pp. 11351144.CrossRefGoogle Scholar
Ruder, S. (2017). An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098.Google Scholar
Russell, C. (1921). How to write a sales-making letter. Printers’ Ink 115, 4956.Google Scholar
Sakaki, S., Chen, F., Korpusik, M. and Chen, Y. (2016). Corpus for customer purchase behavior prediction in social media. In Language Resources and Evaluation Conference, Portorož, Slovenia, pp. 29762980.Google Scholar
Schuster, M., Paliwal, K. K. and General, A. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11), 26732681.CrossRefGoogle Scholar
Tang, D., Qin, B. and Liu, T. (2015). Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal: Association for Computational Linguistics, pp. 14221432.CrossRefGoogle Scholar
Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical expressions. In arXiv:1605.02688.Google Scholar
Tzeng, E., Hoffman, J., Darrell, T. and Saenko, K. (2017). Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition, Honolulu, HI, USA.CrossRefGoogle Scholar
Vieira, A. (2015). Predicting online user behaviour using deep learning algorithms. In arXiv:1511.06247.Google Scholar
Werbos, P.J. (1990). Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78(10), 15501560.CrossRefGoogle Scholar
Weston, J., Bengio, S. and Usunier, N. (2011). WSABIE: Scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain: AAAI Press, pp. 27642770.Google Scholar
Wijaya, B.S. (2015). The development of hierarchy of effects model in advertising. International Research Journal of Business Studies 5(1).Google Scholar
Xu, S., Liang, H. and Baldwin, T. (2016). UNIMELB at SemEval-2016 tasks 4a and 4b: An ensemble of neural networks and a word2vec based model for sentiment classification. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA: Association for Computational Linguistics, pp. 183189.Google Scholar
Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J. and Li, Z. (2017). Building task-oriented dialogue systems for online shopping. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, pp. 46184626.Google Scholar
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A. and Hovy, E. (2016). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA: Association for Computational Linguistics, pp. 14801489.Google Scholar
Zeiler, M.D. (2012). ADADELTA: An adaptive learning rate method. In arXiv:1212.5701.Google Scholar