POS-tagging arabic texts: A novel approach based on ant colony

CHIRAZ ZRIBI BEN OTHMANE; FERIEL BEN FRAJ; ICHRAF LIMAM

doi:10.1017/S1351324915000480

POS-tagging arabic texts: A novel approach based on ant colony

Published online by Cambridge University Press: 11 February 2016

CHIRAZ ZRIBI BEN OTHMANE ,

FERIEL BEN FRAJ and

ICHRAF LIMAM

Show author details

CHIRAZ ZRIBI BEN OTHMANE: Affiliation:
RIADI laboratory, National School of Computer Sciences, La Manouba University, Tunisia e-mails: Chiraz.BenOthmane@riadi.rnu.tn, Feriel.BenFraj@riadi
FERIEL BEN FRAJ: Affiliation:
RIADI laboratory, National School of Computer Sciences, La Manouba University, Tunisia e-mails: Chiraz.BenOthmane@riadi.rnu.tn, Feriel.BenFraj@riadi
ICHRAF LIMAM: Affiliation:
Faculty of Science of Bizerte, Carthage University, Tunisia e-mail: Ichraf.Limam@fsb.rnu.tn

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The specificities of the Arabic language, mainly agglutination and vocalization make the task of POS-tagging more difficult than for Indo-European languages. Consequently, POS-tagging texts with good accuracy remains a challenging problem for Arabic language processing applications. In this work, we consider the task of POS-tagging as an optimization problem modeled as a graph whose nodes correspond to all possible grammatical tags given by a morphological analyzer for words in a sentence and the goal is to find the best path (sequence of tags) in this graph. To resolve this problem, we propose a novel approach based on ant colony. Ant colony-based algorithms are among the most efficient methods to resolve optimization problems modeled as a graph. The collaboration of ants having various knowledge creates a collective intelligence and increases efficiency. We have performed experiments on both vocalized and non-vocalized texts and tested two different tagsets containing fine and coarse grained composite tags. The obtained results showed good accuracy rates and hence, the benefits of swarm intelligence for the POS-tagging problem.

Type: Articles
Information: Natural Language Engineering , Volume 23 , Issue 3 , May 2017 , pp. 419 - 439

DOI: https://doi.org/10.1017/S1351324915000480 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Al Jihad, A., Yousfi, A., and Si-Lhoussain, A., 2011. Morpho-syntactic tagging system based on the patterns words for arabic texts. In Proceedings of The International Arab Journal of Information Technology (IAJIT), Zarqa University, Jordan, pp. 350–54.Google Scholar

Al Shamsi, F., and Guessoum, A., 2006. A hidden markov model–based POS tagger for arabic. In Proceedings of the 8^èmes Journées internationales d’Analyse statistique des Données Textuelles (JADT 2006), Besançon, France, pp. 31–42.Google Scholar

Alabbas, M., and Ramsay, A., 2014. Combining strategies for tagging and parsing Arabic. In Proceedings of the EMNLP, Workshop on Arabic Natural Language Processing, Doha, Qatar, pp. 73–7.Google Scholar

Al-Taani, A., and Abu Al-Rub, S., 2009. A rule-based approach for tagging non-vocalized arabic words. In Proceedings of the International Arab Journal of Information Technology (IAJIT), Zarqa University, Jordan, pp. 320–28.Google Scholar

Ben Ali, B., and Jarray, F. 2013. Genetic approach for Arabic part of speech tagging. In Proceedings of the International Journal on Natural Language Computing (IJNLC), AIRCC publishing corporation, India, (2): 1–10.Google Scholar

Ben Othmane Zribi, C. 1998. De la synthèse lexicographique à la détection et à la correction des graphies fautives arabes. PhD thesis, University of Paris XI, Orsay, France.Google Scholar

Ben Othman-Zribi, C., Torjmen, A., and Ben-Ahmed, M. 2005. A multi-agent system for POS-tagging vocalized Arabic texts. In Proceedings of the International Arab Journal of Information Technology (IAJIT), Zarqa University, Jordan, pp. 322–29.Google Scholar

Chen, S. F., and Goodman, J. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34’th Annual Meeting on Association for Computational Linguistics (ACL 1996), Santa Cruz, California, pp. 310–18.Google Scholar

Darroch, J. N., and Ratcliff, D. 1972. Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43 (5): 1470–480.Google Scholar

Diab, M. 2009. Second generation tools (AMIRA2.0): fast and robust tokenization, POS tagging, and base phrase chunking. In Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt, pp. 285–88.Google Scholar

Dorigo, M., and Gambardella, L. 1997. Ant colony system: a cooperative learning approach to the traveling salesman problem. IEEE Transactions on Evolutionary Computation 1 (1): 53–65.Google Scholar

Gahbiche-Braham, S., Bonneau-Maynard, H., Lavergne, T., and Yvon, F. 2012. Joint segmentation and POS tagging for arabic using a CRF-based classifier. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012), Istanbul, Turkey, pp. 2107–113.Google Scholar

Habash, N., and Owen, R. 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the Conference of the American Association for Computational Linguistics, Michigan, USA, pp. 573–80.Google Scholar

Hadni, M., Ouatik, S., Lachkar, A., and Meknassi, M. 2013. Improving rule-based method for Arabic pos-tagging using HMM technique. Computer Science and Information Technology Sundarapandian, 3 (8): 257–69.Google Scholar

Hölldobler, B., and Wilson, E. O., 1990. The Ants. Cambridge: The Belknap Press.CrossRef Google Scholar

Johnson, M., German, S., Canon, S., Chi, Z., and Riezler, R., 1999. Estimators for stochastic ‘unification-based’ grammars. In Proceedings of the 37^th Meeting of the ACL, Maryland, USA, pp. 535–41.Google Scholar

Kanaan, G., Al-Shalabi, R., and Sawlaha, M. 2003. Full automatic Arabic text tagging system. In Proceedings of the International Conference on Information Technology and Natural Sciences ICITNS, Amman, Jordan, pp. 258–67.Google Scholar

Koeling, R., 2000. Chunking with maximum entropy models. In Proceedings of the CoNLL Workshop, Lisbon, Portugal, pp. 139–41.Google Scholar

Mohamed, E., and Kübler, S. 2010. Is Arabic part of speech tagging feasible without word segmentation? In Proceedings of the 11^th Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, USA, pp. 705–08.Google Scholar

Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., and Roth, R., 2014. MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik, Iceland, pp. 1094–101.Google Scholar

Ratnaparkhi, A., 1998. A maximum entropy part-of-speech tagger. In Proceedings of the EMNLP Conference, Philadelphia, Pennsylvania, pp. 133–42.Google Scholar

Ratnaparkhi, A. 1999. Learning to parse natural language with maximum entropy models. Machine Learning, 34 (1–3): 151–75, Berlin: Springer.CrossRef Google Scholar

Sawalha, M., Atwell, E., and Aboushariah, M. A. M., 2013. SALMA: standard Arabic language morphological analysis. In Proceedings of the First International Conference on Communications, Signal Processing and their Applications (ICCSPA 2013), Sharjah, United Arab Emirates, pp. 1–6.Google Scholar

Sulaiti, L. 2004. Designing and Developing a Corpus of Contemporary Arabic. Master of Science, School of Computing, University of Leeds, United Kingdom.Google Scholar

Zemerli, Z., and Khabet, S. 2004. TAGGAR: un analyseur morphosyntaxique destiné A La synthèse vocale des textes Arabes voyellés. In Proceedings of the JEP-TALN 2004. Traitement Automatique de l’Arabe, Fez, Morocco.Google Scholar

Article contents

POS-tagging arabic texts: A novel approach based on ant colony

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests