Skip to main content Accessibility help

A survey on text mining in social networks


In this survey, we review different text mining techniques to discover various textual patterns from the social networking sites. Social network applications create opportunities to establish interaction among people leading to mutual learning and sharing of valuable knowledge, such as chat, comments, and discussion boards. Data in social networking websites is inherently unstructured and fuzzy in nature. In everyday life conversations, people do not care about the spellings and accurate grammatical construction of a sentence that may lead to different types of ambiguities, such as lexical, syntactic, and semantic. Therefore, analyzing and extracting information patterns from such data sets are more complex. Several surveys have been conducted to analyze different methods for the information extraction. Most of the surveys emphasized on the application of different text mining techniques for unstructured data sets reside in the form of text documents, but do not specifically target the data sets in social networking website. This survey attempts to provide a thorough understanding of different text mining techniques as well as the application of these techniques in the social networking websites. This survey investigates the recent advancement in the field of text analysis and covers two basic approaches of text mining, such as classification and clustering that are widely used for the exploration of the unstructured text available on the Web.

  • View HTML
    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      A survey on text mining in social networks
      Available formats

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      A survey on text mining in social networks
      Available formats

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      A survey on text mining in social networks
      Available formats



Hide All
Aci, M., Inan, C. & Avci, M. 2010. A hybrid classification method of k-nearest neighbour, Bayesian method and genetic algorithm. Expert Systems with Applications 37(7), 50615067.
Aggarwal, C. 2011. Text mining in social networks. In Social Network Data Analytics, Charu, A. C. (ed.), 2nd edition. Springer, 353374.
Baatarjav, E., Phithakkitnukoon, S. & Dantu, R. 2008. Group Recommendation System for Facebook, 2nd edition. Springer.
Baumer, E. P. S., Sinclair, J. & Tomlinson, B. 2010. America is like metamucil: fostering critical and creative thinking about metaphor in political blogs. In Proceedings of 28th International Conference on Human Factor in Computing Systems (CHI 2010). ACM, 34–45.
Brucher, H., Knolmayer, G. & Mittermayer, M. 2002. Document classification methods for organizing explicit knowledge. In Proceedings of 3rd European Conference on Organizational Knowledge, Learning and Capabilities, 1–25.
Chang, M. & Poon, C. K. 2009. Using phrases as features in e-mail classification. Journal of System and Software 82(6), 10361945.
Chen, W. & Wang, M. 2009. A fuzzy c-means clustering-based fragile watermarking scheme for image authentication. Expert Systems with Applications 36(2), 13001307.
Dai, Y., Kakkonen, T. & Sutinen, E. 2011. MinEDec: a decision-support model that combines text-mining technologies with two competitive intelligence analysis method. International Journal of Computer Information System and Industrial Management Applications 3, 165173.
Durga, A. K. & Govardhan, A. 2011. Ontology based text categorization-telugu document. International Journal of Scientific and Engineering Research 2(9), 14.
Esuli, A. & Sibastiani, F. 2006. SentiWordNet: a publicly available lexical resource for opinion mining. In Proceedings of the 5th International Conference on Language Resources and Evaluation, 417–422.
Evans, B. M., Kairam, S. & Pirolli, P. 2010. Do your friends make you smarter: an analysis of social strategies in online information seeking. Information Processing and Management 46(6), 679692.
Forman, G. & Kirshenbaum, E. 2008. Extremely fast text feature extraction for classification and indexing. In Proceedings of 17th ACM Conference on Information and Knowledge Management, 26–30.
Gazzah, S. & Ammara, N. B. 2008. Neural network and support vector machines classifiers for writer identification using Arabic script. International Arab Journal of Information Technology 5(1), 92101.
Guzek, M., Pecero, J. E., Dorronsoro, B., Bouvry, P. & Khan, S. U. 2010. A cellular genetic algorithm for scheduling applications and energy-aware communication optimization. In Proceedings of PACM/IEEE/IFIP International Conference on High Performance Computing and Simulation (HPCS), 241–248.
Hang, N., Honda, K., Ichihashi, H. & Notsu, A. 2008. Linear fuzzy clustering of relational databased on extended fuzzy c-medoids. In Proceedings of IEEE International Conference on Fuzzy Systems, 366–371.
Hua, J., Tembe, W. D., Dougherty, E. R. & Edward, R. D. 2009. Performance of feature-selection methods in the classification of high-dimension data. Pattern Recognition 42(3), 409424.
Jain, A. K. 2010. Data clustering: 50 years beyond k-means. Pattern Recognition 31(8), 651666.
Jo, T. 2010. NTC (Neural Text Categorizer): neural network for text categorization. International Journal of Information Science 2(2), 8396.
Kano, Y., Baumgartner, W. A., McCrohon, L., Ananiadou, S., Cohen, K. B., Hunter, L. & Tsujii, T. 2009. Data mining: concept and techniques. Oxford Journal of Bioinformatics 25(15), 19971998.
Kavitha, V. & Punithavalli, M. 2010. Clustering time series data stream – a literature survey. International Journal of Computer Science and Information Security 8(1), 289294.
Khalessizadeh, S. M., Zaefarian, R., Nasseri, S. H. & Ardil, E. 2006. Genetic mining: using genetic algorithm for topic based on concept distribution. Journal of Word Academy of Science, Engineering and Technology 13(2), 144147.
Kolodziej, J., Burczynski, B. & Khan, S. U. 2012. Advances in Intelligent Modelling and Simulation: Artificial Intelligence-Based Models and Techniques in Scalable Computing, Springer-Verlag.
Kolodziej, J., Khan, S. U. & Xhafa, F. 2011. Genetic algorithms for energy-aware scheduling in computational grids. In Proceedings of 6th IEEE International Conference on P2P, Parallel, Grid, Cloud, and Internet Computing (3PGCIC), 17–24.
Lee, L. H., Wan, C. H., Yong, T. F. & Kok, H. M. 2010. A review of nearest neighbour-support vector machine hybrid classification model. Journal of Applied Science 10(17), 18411858.
Li, J. & Khan, S. U. 2009a. MobiSN: semantics-based mobile ad hoc social network framework. In Proceedings of IEEE Global Communications Conference (Globecom), Zomaya, A. Y. & Sarbazi-Azad, H. (eds). John Wiley & Sons, Hoboken, NJ, USA, 2013, ISBN: 978-0-470-93688-7.
Li, J. & Khan, S. U. 2009b. On How to Construct a Social Network from a Mobile Ad Hoc Network. Technical report, NDSU-CS-TR-09-009, North Dakota State University.
Li, J., Khan, S. U., Li, Q., Ghani, N., Bouvry, P. & Zhang, W. 2011a. Efficient data sharing over large-scale distributed communities. In Intelligent Decision Systems in Large-Scale Distributed Environments, Bouvry, P., Gonzalez-Velez, H. & Kolodziej, J. (eds). Springer, New York, NY, USA, 2011, pp. 110–128, ISBN: 978-3-642-21270-3.
Li, J., Li, Q., Khan, S. U. & Ghani, N. 2011b. Community-based cloud for emergency management. In Proceedings of the 6th IEEE International Conference on System of Systems Engineering (SoSE), 55–60.
Li, J., Wang, H. & Khan, S. U. 2012. A fully distributed scheme for discovery of semantic relationships. IEEE Transactions on Services Computing 6(4), 257469.
Ling, H. S., Bali, R. & Salam, R. 2006. Emotion detection using keywords spotting and semantic network. In Proceedings of International Conference on Computing and Informatics IEEE (ICOCI), 1–5.
Liu, F. & Lu, X. 2011. Survey on text clustering algorithm. In Proceedings of 2nd International IEEE Conference on Software Engineering and Services Science (ICSESS), 901–904.
Luger, G. F. 2008. Artificial Intelligence: Structure and Strategies for Complex Problem Solving, 6th edition. Addison Wesley.
Ma, C., Helmut, P. & Mitsuru, l. 2005. Emotion Estimation and Reasoning Based on Affective Textual Interaction, 3rd edition. Springer.
Meesad, P., Boonrawd, P. & Nuipian, V. 2011. A chi-square-test for word importance differentiation in text classification. In Proceedings of International Conference on Information and Electronics Engineering, 110–114.
Mehmed, K. 2011. Data Mining: Concepts, Models, Methods, and Algorithms, 2nd edition. John Wiley & Sons.
Miao, D., Duan, Q., Zhang, H. & Jiao, N. 2009. Rough set based hybrid algorithm for text classification. Journal of Expert Systems with Applications 36(5), 91689174.
Mitra, V., Wang, C. & Banerjee, S. 2005. A neuro-SVM model for text classification using latent semantic indexing. In Proceedings of International Joint Conference on Neural Networks, 564–569.
Negi, P. S., Rauthan, M. M. S. & Dhami, H. S. 2010. Language model for information retrieval. International Journal of Computer Applications 12(7), 1317.
Patterson, D., Rooney, N., Galushka, M., Dobrynin, V. & Smirnova, E. 2008. SOPHIA-TCBR: a knowledge discovery framework for textual case-based reasoning. Knowledge-Based Systems 21(5), 404414.
Remeikis, N., Skucas, I. & Melninkaite, V. 2005. Hybrid machine learning approach for text categorization. International Journal of Computational Intelligence 1(1), 6367.
Ringel, M. M., Teevan, J. & Panovich, K. 2010. What do people ask their social networks, and why: a survey study of status message question & answer behavior. In Proceedings of International Conference on Human Factors in Computing Systems (CHI 10), 56–62.
Sathiyakumari, K. & Manimekalai, G. 2011. A survey on various approaches in document clustering. International Journal of Computer Technology and Application (IJCTA) 2(5), 15341539.
Shekar, C. B. H. & Shoba, G. 2009. Classification of documents using Kohonens self organizing map. International Journal of Computer Theory and Engineering (IACSIT) 1(5), 610613.
Sorensen, L. 2009. User managed trust in social networking comparing Facebook, MySpace and LinkedIn. In Proceedings of 1st International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic System Technology, (Wireless VITAE 09), 427–431.
Strapparava, C. & Ozbal, G. 2010. The color of emotion in text. In Proceedings of 2nd Workshop on Cognitive Aspects of the Lexicon, 28–32.
Tekiner, F., Aanaiadou, S., Tsuruoka, Y. & Tsuji, J. 2009. Highly scalable text mining parallel tagging application. In Proceedings of IEEE 5th International Conference on Soft Computing, Computing with Words and Perception in System Analysis, Decision and Control (ICSCCW), 1–4.
Udupa, R. & Kumar, S. 2010. Hashing-based approaches to spelling correction of personal names. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 1256–1265.
Wimalasuriya, D. C. & Dou, D. 2010. Ontology-based information extraction: an introduction and a survey of current approach. Journal of Information Science 36(5), 306323.
Wollmer, M., Eyben, F., Keshet, J., Graves, A., Schuller, B. & Rigool, G. 2009. Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 3949–3952.
Wu, C. 2009. Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications 36(3), 43214330.
Xu, X., Zhang, F. & Niu, Z. 2008. An ontology-based query system for digital libraries. In Proceedings of IEEE, Pacific-Asia Workshop on Computational Intelligence and Industrial Application, 222–226.
Yin, S., Wang, G., Qiu, Y. & Zhang, W. 2007. Research and implement of classification algorithm on web text mining. In Proceedings of 3rd International Conference on Semantics, Knowledge and Grid, 446–449.
Yuan, L. 2010. Improvement for the automatic part-of-speech tagging based on Hidden Markov Model. In Proceedings of 2nd International Conference on Signal Processing System IEEE (ICSPS), 744–747.
Yonghong, Y. & Wenyang, B. 2010. Text clustering based on term weights automatic partition. In Proceedings of 2nd International Conference on Computer and Automation Engineering (ICCAE), 373–377.
Yoo, K. 2012. Automatic document archiving for cloud storage using text mining-based topic identification technique. In Proceedings of International Conference on Information and Computer Application, 189–192.
Yoshida, K., Tsuruoka, Y., Miyao, Y. & Tsujii, J. 2007. Ambiguous part-of-speech tagging for improving accuracy and domain portability of syntactic parsers. In Proceedings of 20th International Conference on Artificial Intelligence, 1783–1788.
Yu, Y. & Hsu, C. 2011. A structured ontology construction by using data clustering and pattern tree mining. In Proceedings of International Conference on Machine Learning and Cybernetics, 45–49.
Zhao, P., Han, J. & Sun, Y. 2009. P-Rank: a comprehensive structural similarity measure over information networks. In Proceedings of 18th ACM Conference on Information and Knowledge Management, 233–238.

Related content

Powered by UNSILO

A survey on text mining in social networks


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.