Generating Arabic TAG for syntax-semantics analysis

Cherifa Ben Khelil; Chiraz Ben Othmane Zribi; Denys Duchier; Yannick Parmentier

doi:10.1017/S1351324922000109

Generating Arabic TAG for syntax-semantics analysis

Published online by Cambridge University Press: 24 March 2022

Cherifa Ben Khelil

Chiraz Ben Othmane Zribi ,

Denys Duchier and

Yannick Parmentier

Show author details

Cherifa Ben Khelil*: Affiliation:
LIFAT, Université de Tours, Tours 37200, France
Chiraz Ben Othmane Zribi: Affiliation:
RIADI, ENSI, Université La Manouba, La Manouba, Tunisia
Denys Duchier: Affiliation:
LIFO, Université d’Orléans, Orléans, France
Yannick Parmentier: Affiliation:
LORIA, Projet SYNALP, Université de Lorraine, Vandoeuvre-les-Nancy, France
*: *Corresponding author. E-mail: cherifa.bk@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Arabic presents many challenges for automatic processing. Although several research studies have addressed some issues, electronic resources for processing Arabic remain relatively rare or not widely available. In this paper, we propose a Tree-adjoining grammar with a syntax-semantic interface. It is applied to the modern standard Arabic, but it can be easily adapted to other languages. This grammar named “ArabTAG V2.0” (Arabic Tree Adjoining Grammar) is semi-automatically generated by means of an abstract representation called meta-grammar. To ensure its development, ArabTAG V2.0 benefits from a grammar testing environment that uses a corpus of phenomena. Further experiments were performed to check the coverage of this grammar as well as the syntax-semantic analysis. The results showed that ArabTAG V2.0 can cover the majority of syntactical structures and different linguistic phenomena with a precision rate of 88.76%. Moreover, we were able to semantically analyze sentences and build their semantic representations with a precision rate of about 95.63%.

Keywords

Tree-adjoining grammar Meta-grammar Syntax Semantic Syntax/semantic interface Semantic frames Arabic language

Type: Article
Information: Natural Language Engineering , Volume 29 , Issue 2 , March 2023 , pp. 386 - 424

DOI: https://doi.org/10.1017/S1351324922000109 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abeillé, A. (1993). Les nouvelles syntaxes: Grammaires d’unification et analyse du français. Edition Armand Colin.Google Scholar

Alahverdzhieva, K. (2008). XTAG using XMG. Master Thesis, University of Nancy, France.Google Scholar

Al-Bataineh, B. and Bataineh, E. (2009). An efficient recursive transition network parser for Arabic language. In Lecture Notes in Engineering and Computer Science, vol. 2177.Google Scholar

Al-Taani, A., Msallam, M. and Wedian, S. (2012). A top-down chart parser for analyzing Arabic sentences. The International Arab Journal of Information Technology IAJIT 9(2), 109–116.Google Scholar

Arps, D. and Petitjean, S. (2018). A parser for LTAG and frame semantics. In Proceedings of the 11th Language Resources and Evaluation Conference, Miyazaki, Japan. European Language Resource Association.Google Scholar

Attia, M. (2008). Handling Arabic Morphological and Syntactic Ambiguity within the LFG Framework with a View to Machine Translation. Doctoral Thesis. The University of Manchester, Manchester.Google Scholar

Belguith, L., Aloulou, C. and Hamadou, A. (2007). MASPAR: De la segmentation à l’analyse syntaxique de textes arabes. CÉPADUÉS-Editions, editeur, Revue Information Interaction Intelligence I 3, 9–6.Google Scholar

Ben Fraj, F. (2010). Un analyseur syntaxique pour les textes en langue arabe à base d’un apprentissage à partir des patrons d’arbres syntaxiques. PhD Thesis, ENSI La Manouba, Tunisia.Google Scholar

Ben Khelil, C., Ben Othmane Zribi, C., Duchier, D. and Parmentier, Y. (2018). A semi-automatically generated TAG for Arabic: Dealing with linguistic phenomena. In 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), Hanoï, Vietnam.Google Scholar

Ben Khelil, C., Duchier, D., Parmentier, Y., Ben Othmane Zribi, C. and Ben Fraj, F. (2016). Arabtag: From a handcrafted to a semi-automatically generated TAG. In Proceedings of the 12th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+12), Heinrich Heine University, Düsseldorf, Germany, pp. 18–26.Google Scholar

Ben Othmane Zribi, C., Ben Fraj, F. and Limam, I. (2017). POS-tagging Arabic texts: A novel approach based on ant colony. Natural Language Engineering 23(3), 419–439. Cambridge University Press.CrossRef Google Scholar

Bos, J. (1995). Predicate logic unplugged. In Proceedings of the Tenth Amsterdam Colloquium, Amsterdam.Google Scholar

Boukedi, S. and Haddar, K. (2014). HPSG grammar treating of different forms of Arabic coordination. Research in Computing Science 86, 25–41.CrossRef Google Scholar

Bresnan, J. and Kaplan, R.M. (1982). Introduction: Grammars as Mental Representations of Language . The Mental Representation of Grammatical Relations. MIT Press, Cambridge, MA.Google Scholar

Candito, M. (1996). A principle-based hierarchical representation of LTAGS. In 16th International Conference on Computational Linguistics, Proceedings of the Conference COLING, Center for Sprogteknologi, Copenhagen, Denmark, pp. 194–199.CrossRef Google Scholar

Crabbé, B. (2005). Représentation informatique de grammaires fortement lexicalisées: Application Àla grammaire d’arbres adjoints. PhD Thesis, University of Nancy 2, France.Google Scholar

Crabbé, B., Duchier, D., Gardent, C., Roux, J.L. and Parmentier, Y. (2013). XMG: Extensible metagrammar. Computational Linguistics 39(3), 591–629.CrossRef Google Scholar

Debili, F., Achour, H. and Souissi, E. (2002). La langue arabe et l’ordinateur: De l’étiquetage grammatical à la voyellation automatique. Correspondances N

$^{\circ}$ 71, Lyon, France.Google Scholar

Eberhard, D., Simons, G. and Fennig, C. (2020). Ethnologue: Languages of the World, 23rd Edn.Google Scholar

Evans, R. and Gazdar, G. (1996) DATR: A language for lexical knowledge representation. Computational Linguistics 22, 167–216.Google Scholar

Fillmore, C.J. (1982). Frame semantics. In Linguistics in the Morning Calm, pp. 111–137.Google Scholar

Frank, A. and Van Genabith, J. (2001). GlueTag linear logic based semantics for LTAG and what it teaches us about LFG and LTAG. In Proceedings of LFG01, Hong Kong.Google Scholar

Gaiffe, B., Crabbé, B. and Roussanaly, A. (2002). A new metagrammar compiler. In Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks, TAG+ 2002, Venice, Italy, pp. 234–241.Google Scholar

Gardent, C. (2008). Integrating a unification-based semantics in a large scale Lexicalised Tree Adjoining Grammar for French. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling) Manchester, pp. 249–256.CrossRef Google Scholar

Gerald, G., Ewan, K., Geoffrey, K.P. and Ivan, S. (1985). Generalized Phrase Structure Grammar. Cambridge, MA & London, UK: Harvard University Press.Google Scholar

Habash, N. and Rambow, O. (2004). Extracting a tree adjoining grammar from the Penn Arabic Treebank. In Traitement Automatique du Langage Naturel, pp. 277–284.Google Scholar

Habash, N. and Roth, R.M. (2009). CATib: The columbia Arabic treebank. In Technical Report CCLS-09-01, Center for Computational Learning Systems, Columbia University.CrossRef Google Scholar

Haddad, B. and Yaseen, M. (2005). A compositional approach towards semantic representation and construction of ARABIC. In Blache P., Stabler E., Busquets J. and Moot R. (eds), Lecture Notes in Computer Science, LNAI, vol. 3492, pp. 147–161.CrossRef Google Scholar

Haddar, K., Boukedi, S. and Zalila, I. (2010). Construction of an HPSG grammar for the arabic language and its specification in TDLtdl. International Journal on Information and Communication Technologies 3, 52–64.Google Scholar

Haddar, K., Zalila, I. and Boukedi, S. (2009). A parser generation with the IKB for the Arabic relatives. International Journal of Computing and Information Sciences 7, 51–60.Google Scholar

Hajič, J., Smrž, O., Petr, Z., Snaidauf, J. and Beška, E. (2004). Prague Arabic Dependency Treebank: development in data and tools. In Proceedings of the NEMLAR International Conference on Arabic Language Resources and Tools.Google Scholar

Hammouda, N.G. and Haddar, K. (2017). Parsing Arabic nominal sentences with transducers to annotate corpora. Computación y Sistemas 21(4), 647–656.Google Scholar

Joshi, A.K. (1987). An introduction to tree adjoining grammars. Mathematics of Language 1, 87–115.CrossRef Google Scholar

Joshi, A.K., Levy, L.S. and Takahashi, M. (1975). Tree adjunct grammars. Journal of Computer and System Sciences 10(1), 136–163.CrossRef Google Scholar

Joshi, A.K. and Vijay-Shanker, K. (2001). Compositional semantics with lexicalized tree-adjoining grammar (LTAG): How much underspecification is necessary ?. In Computing Meaning. Springer, pp. 147–163.CrossRef Google Scholar

Kallmeyer, L. and Joshi, A. (2003). Factoring predicate argument and scope semantics: Underspecified semantics with LTAG. Research on Language and Computation 1(1–2), 3–58.CrossRef Google Scholar

Kallmeyer, L., Lichte, T., Maier, W., Parmentier, Y. and Dellert, J. (2008). Developing a TT-MCTAG for German with an RCG-based parser. In The Sixth International Conference on Language Resources and Evaluation (LREC 08), Marrakech, Morocco, pp. 782–789.Google Scholar

Kallmeyer, L. and Osswald, R. (2013). Syntax-driven semantic frame composition in lexicalized tree adjoining grammars. Journal of Language Modelling 1(2), 267–330.Google Scholar

Kallmeyer, L. and Romero, M. (2008). Scope and situation binding in LTAG using semantic unification. Research on Language and Computation 6, 3–52.CrossRef Google Scholar

Kasper, S. (2008). A Comparison of “Thematic Role” Theories. Doctoral Thesis, Philipps-Universität Marburg, Germany.Google Scholar

Kipper, K., Korhonen, A., Ryant, N. and Palmer, M. (2008). A large-scale classification of english verbs. Language Resources and Evaluation 42(1), 21–40.CrossRef Google Scholar

Kouloughli, D. (1992). La Grammaire de l’Arabe d’aujourd’hui. Press Pocket, Paris, France.Google Scholar

Lecomte, A. (2004). Méthodes pour le Traitement Automatique des Langues. M1 Ingénierie de la Communication Personne-Systéme, Pierre Mendes-France University, France.Google Scholar

Levin, B. (1993). English Verb Classes and Alternations: A Preliminary Investigation. University of Chicago Press.Google Scholar

Loukam, M. and Laskri, M.T. (2008). Pharas: Une plateforme d’analyse basée sur le formalisme HPSG pour l’arabe standard: Développements récents et perspectives. African Journals Online (AJOL).Google Scholar

Maamouri, M. and Bies, A. (2004) Developing an Arabic treebank: Methods, guidelines, procedures, and tools. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Semitic’04, Stroudsburg, PA, USA. Association for Computational Linguistics, pp. 2–9.CrossRef Google Scholar

Maamouri, M., Bies, A., Jin, H. and Buckwalter, T. (2003). Arabic treebank: Part 1 v 2.0. linguistic Data Consortium, catalog number ldc2003t06, ISBN:1-58563-261-9.Google Scholar

Maamouri, M. and Data Consortium, Linguistic . (2011). Arabic Treebank: Part 2, v 3.1. Philadelphia, PA: Linguistic Data Consortium.Google Scholar

Mousser, J. (2010). A large coverage verb taxonomy for Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Valletta, Malta.Google Scholar

Mousser, J. (2011). Classifying arabic verbs using sibling classes. In Proceedings of the Ninth International Conference on Computational Semantics (IWCS), Oxford, UK.Google Scholar

Othman, E., Shaalan, K. and Rafea, A. (2003). A chart parser for analyzing modern standard arabic sentence. In The MT Summit IX Workshop on Machine Translation for Semitic Languages: Issues and Approaches. New Orleans, Louisiana, USA.Google Scholar

Parmentier, Y., Kallmeyer, L., Lichte, T., Maier, W. and Dellert, J. (2008). Tulipa: A syntax-semantics parsing environment for mildly context-sensitive formalisms. In 9th International Workshop on Tree-Adjoining Grammar and Related Formalisms (TAG+9), Tübingen, Germany, pp. 121–128.Google Scholar

Petitjean, S. (2014). Modular Generation of Formal Grammars. PhD Thesis, University of Orleans, France.Google Scholar

Pollard, C. and Sag, I.A. (1994). Head-driven phrase structure grammar. Chicago & London: The University of Chicago Press.Google Scholar

Rogers, J. and Vijay-Shanker, K. (1994). Obtaining trees from their descriptions: An application to tree-adjoining grammars. Computational Intelligence 10, 401–421.CrossRef Google Scholar

Schabes, Y. and Joshi, A.K. (1990). Parsing with lexicalized tree adjoining grammar. Technical Reports (CIS).Google Scholar

Thomasset, F. and De La Clergerie, E. (2005). Comment obtenir plus des méta-grammaires. In Proceedings of the 12th Conference on Natural Language Processing (TALN).Google Scholar

Vijay-Shanker, K. and Joshi, A. (1991). Unification-based tree adjoining grammars. Technical Reports (CIS).Google Scholar

Villemonte De la Clergerie, E. (2005). Dyalog: A tabular logic programming based environment for NLP. In Proceedings of Constraints and Language Processing (CSLP).Google Scholar

Xia, F. (2001). Automatic Grammar Generation from Two Different Perspectives. Doctoral Thesis, University of Pennsylvania.Google Scholar

XTAG RG. (2001). A lexicalized tree adjoining grammar for english. Technical Report IRCS-01-03, IRCS, University of Pennsylvania.Google Scholar

Article contents

Generating Arabic TAG for syntax-semantics analysis

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests