Hostname: page-component-8448b6f56d-jr42d Total loading time: 0 Render date: 2024-04-24T08:35:07.945Z Has data issue: false hasContentIssue false

POS-tagging arabic texts: A novel approach based on ant colony

Published online by Cambridge University Press:  11 February 2016

CHIRAZ ZRIBI BEN OTHMANE
Affiliation:
RIADI laboratory, National School of Computer Sciences, La Manouba University, Tunisia e-mails: Chiraz.BenOthmane@riadi.rnu.tn, Feriel.BenFraj@riadi
FERIEL BEN FRAJ
Affiliation:
RIADI laboratory, National School of Computer Sciences, La Manouba University, Tunisia e-mails: Chiraz.BenOthmane@riadi.rnu.tn, Feriel.BenFraj@riadi
ICHRAF LIMAM
Affiliation:
Faculty of Science of Bizerte, Carthage University, Tunisia e-mail: Ichraf.Limam@fsb.rnu.tn

Abstract

The specificities of the Arabic language, mainly agglutination and vocalization make the task of POS-tagging more difficult than for Indo-European languages. Consequently, POS-tagging texts with good accuracy remains a challenging problem for Arabic language processing applications. In this work, we consider the task of POS-tagging as an optimization problem modeled as a graph whose nodes correspond to all possible grammatical tags given by a morphological analyzer for words in a sentence and the goal is to find the best path (sequence of tags) in this graph. To resolve this problem, we propose a novel approach based on ant colony. Ant colony-based algorithms are among the most efficient methods to resolve optimization problems modeled as a graph. The collaboration of ants having various knowledge creates a collective intelligence and increases efficiency. We have performed experiments on both vocalized and non-vocalized texts and tested two different tagsets containing fine and coarse grained composite tags. The obtained results showed good accuracy rates and hence, the benefits of swarm intelligence for the POS-tagging problem.

Type
Articles
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Al Jihad, A., Yousfi, A., and Si-Lhoussain, A., 2011. Morpho-syntactic tagging system based on the patterns words for arabic texts. In Proceedings of The International Arab Journal of Information Technology (IAJIT), Zarqa University, Jordan, pp. 350–54.Google Scholar
Al Shamsi, F., and Guessoum, A., 2006. A hidden markov model–based POS tagger for arabic. In Proceedings of the 8èmes Journées internationales d’Analyse statistique des Données Textuelles (JADT 2006), Besançon, France, pp. 3142.Google Scholar
Alabbas, M., and Ramsay, A., 2014. Combining strategies for tagging and parsing Arabic. In Proceedings of the EMNLP, Workshop on Arabic Natural Language Processing, Doha, Qatar, pp. 73–7.Google Scholar
Al-Taani, A., and Abu Al-Rub, S., 2009. A rule-based approach for tagging non-vocalized arabic words. In Proceedings of the International Arab Journal of Information Technology (IAJIT), Zarqa University, Jordan, pp. 320–28.Google Scholar
Ben Ali, B., and Jarray, F. 2013. Genetic approach for Arabic part of speech tagging. In Proceedings of the International Journal on Natural Language Computing (IJNLC), AIRCC publishing corporation, India, (2): 110.Google Scholar
Ben Othmane Zribi, C. 1998. De la synthèse lexicographique à la détection et à la correction des graphies fautives arabes. PhD thesis, University of Paris XI, Orsay, France.Google Scholar
Ben Othman-Zribi, C., Torjmen, A., and Ben-Ahmed, M. 2005. A multi-agent system for POS-tagging vocalized Arabic texts. In Proceedings of the International Arab Journal of Information Technology (IAJIT), Zarqa University, Jordan, pp. 322–29.Google Scholar
Chen, S. F., and Goodman, J. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34’th Annual Meeting on Association for Computational Linguistics (ACL 1996), Santa Cruz, California, pp. 310–18.Google Scholar
Darroch, J. N., and Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43 (5): 1470–480.Google Scholar
Diab, M. 2009. Second generation tools (AMIRA2.0): fast and robust tokenization, POS tagging, and base phrase chunking. In Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt, pp. 285–88.Google Scholar
Dorigo, M., and Gambardella, L. 1997. Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation 1 (1): 5365.Google Scholar
Gahbiche-Braham, S., Bonneau-Maynard, H., Lavergne, T., and Yvon, F. 2012. Joint segmentation and POS tagging for arabic using a CRF-based classifier. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey, pp. 2107–113.Google Scholar
Habash, N., and Owen, R. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the Conference of the American Association for Computational Linguistics, Michigan, USA, pp. 573–80.Google Scholar
Hadni, M., Ouatik, S., Lachkar, A., and Meknassi, M. 2013. Improving rule-based method for Arabic pos-tagging using HMM technique. Computer Science and Information Technology Sundarapandian, 3 (8): 257–69.Google Scholar
Hölldobler, B., and Wilson, E. O., 1990. The Ants. Cambridge: The Belknap Press.CrossRefGoogle Scholar
Johnson, M., German, S., Canon, S., Chi, Z., and Riezler, R., 1999. Estimators for stochastic ‘unification-based’ grammars. In Proceedings of the 37th Meeting of the ACL, Maryland, USA, pp. 535–41.Google Scholar
Kanaan, G., Al-Shalabi, R., and Sawlaha, M. 2003. Full automatic Arabic text tagging system. In Proceedings of the International Conference on Information Technology and Natural Sciences ICITNS, Amman, Jordan, pp. 258–67.Google Scholar
Koeling, R., 2000. Chunking with maximum entropy models. In Proceedings of the CoNLL Workshop, Lisbon, Portugal, pp. 139–41.Google Scholar
Mohamed, E., and Kübler, S. 2010. Is Arabic part of speech tagging feasible without word segmentation? In Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, USA, pp. 705–08.Google Scholar
Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., and Roth, R., 2014. MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, pp. 1094–101.Google Scholar
Ratnaparkhi, A., 1998. A maximum entropy part-of-speech tagger. In Proceedings of the EMNLP Conference, Philadelphia, Pennsylvania, pp. 133–42.Google Scholar
Ratnaparkhi, A. 1999. Learning to parse natural language with maximum entropy models. Machine Learning, 34 (1–3): 151–75, Berlin: Springer.CrossRefGoogle Scholar
Sawalha, M., Atwell, E., and Aboushariah, M. A. M., 2013. SALMA: standard Arabic language morphological analysis. In Proceedings of the First International Conference on Communications, Signal Processing and their Applications (ICCSPA 2013), Sharjah, United Arab Emirates, pp. 16.Google Scholar
Sulaiti, L. 2004. Designing and Developing a Corpus of Contemporary Arabic. Master of Science, School of Computing, University of Leeds, United Kingdom.Google Scholar
Zemerli, Z., and Khabet, S. 2004. TAGGAR: un analyseur morphosyntaxique destiné A La synthèse vocale des textes Arabes voyellés. In Proceedings of the JEP-TALN 2004. Traitement Automatique de l’Arabe, Fez, Morocco.Google Scholar