Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-ndmmz Total loading time: 0 Render date: 2024-06-02T12:51:15.066Z Has data issue: false hasContentIssue false

19 - Arabic Corpus Linguistics and Related Tools

An Overview and Some Critical Observations

from Part IV - Arabic Computational and Corpus Linguistics

Published online by Cambridge University Press:  23 September 2021

Karin Ryding
Affiliation:
Georgetown University, Washington DC
David Wilmsen
Affiliation:
American University of Beirut
Get access

Summary

Mark Van Mol provides a critical review of the issues involved in the construction of usable Arabic corpora and the solutions that programmers have attempted in resolving them. One such issue is whether a corpus is made freely available or is placed behind a paywall. This distinction often translates into corpus size, as well, with freely available corpora generally being larger and untagged for parts of speech (POS) and those hidden behind paywalls being smaller and POS-tagged. The reason for this is clear: POS tagging requires large amounts of painstaking labour; on the other hand, scouring large amounts of text from the Internet with web scrubber applications can be done in seconds. As for corpus size, different qualifications make it difficult to compare. Size may be expressed in the number of articles, hours, tokens, kilobytes, megabytes, sentences, words, and sometimes paragraphs that the corpus encompasses. One of the reasons for this is that defining the searchable units of Arabic texts presents complications. Such considerations pertain directly to questions of corpus representativeness. With that arises the question of the nature of the phenomenon under scrutiny, whether the corpora are intended to represent Classical Arabic, modern written Arabic, or Arabic dialects.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

References

Abdelnour, J. (1983). Dictionnaire Arabe–Français. Bayreuth: Dar el-Ilm lil-Malayin.Google Scholar
Abouenour, L., Bouzoubaa, K., and Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation 47, 891917.CrossRefGoogle Scholar
Ad-Dahdah, A. (1990). Muʿjam qawāʿid al-ʿarabiyya al-ʿālamiyya [A Dictionary of Universal Arabic Grammar]. Beirut: Maktabat Lubnan.Google Scholar
Adouane, W. and Dobnik, S. (2017). Identification of languages in Algerian Arabic multilingual documents. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 18.Google Scholar
Al-Badrashiny, M. (2017). Layered language model based hybrid approach to automatic full diacritization of Arabic. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 177–84.Google Scholar
Alfaifi, A. (2015). Building the Arabic Learner Corpus and a System for Arabic Error Annotation. PhD thesis, University of Leeds, School of Computing.Google Scholar
Alhawiti, K. (2014). Adaptive Models of Arabic Text. PhD dissertation, Bangor University, Wales, UK.Google Scholar
Alkhazi, I. (2017). Classifying and segmenting Classical and Modern Standard Arabic using minimum cross-entropy. International Journal of Advanced Computer Science and Applications, 8(4), 421–30.Google Scholar
Al-Marwani, N. and Diab, M. (2017). Arabic textual entailment with word embeddings. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia: Association for Computational Linguistics, 177–84.Google Scholar
Almujaiwel, S. (2017). Discursive patterns of anti-feminism and pro-feminism in Arabic newspapers of the KACST corpus. Discourse & Communication, 11(5), 441–66.CrossRefGoogle Scholar
Al-Najem, T. (2007). Inheritance-based approach to Arabic verbal root-and-pattern morphology. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 6787.CrossRefGoogle Scholar
Alosaimy, A. and Atwell, E. (2017). Tagging Classical Arabic text using available morphological analysers and part of speech taggers. Journal for Language Technology and Computational Linguistics, 32(1), 126.Google Scholar
Alqassas, A. (2017). Arabic diglossia and heritage language acquisition: Remarks on acquisition planning. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 8197.Google Scholar
Al-Sayed, A., Hammo, B., and Yagi, S. (2017). Construction of an English–Arabic political parallel corpus. in Proceedings of the New Trends in Information Technology (NTIT-2017). Amman: The University of Jordan.Google Scholar
Al-Shargi, F. and Rambow, O. (2015). DIWAN: A dialectal word annotation tool for Arabic. In Habash, N., Vogel, S., and Darwish, K., eds., Proceedings of the Second Workshop on Arabic Natural Language Processing. Beijing: Association for Computational Linguistics, 4958.Google Scholar
Alshutayri, A. and Atwell, E. (2017). Exploring Twitter as a source of an Arabic dialect corpus. International Journal of Computational Linguistics, 8(2), 3744.Google Scholar
Al-Thubaity, A. and Almujaiwel, S. (2017). A quantitative inquiry into the keywords between primary and reference Arabic corpora. Journal of Quantitative Linguistics 25(2), 121–41. DOI: 10.1080/09296174.2017.1359883, 120.Google Scholar
Badawi, E., Carter, M. G., and Gully, A. (2003). Modern Written Arabic: A Comprehensive Grammar. London: Routledge.Google Scholar
Bernardi, F., Chakhaia, L., and Leopold, L. (2017). ‘Sing me a song with social significance’: The (mis)use of statistical significance testing in European sociological research. European Sociological Review, 33(1), 115.Google Scholar
Biadsy, F., Hirschberg, J., and Habash, N. (2009). Spoken Arabic dialect identification using phonotactic modeling. In Rosner, M. and Shuly, W., eds., Proceedings of the EACL Workshop on Computational Approaches to Semitic Languages, Athens, ACL, Stroudsburg, PA, USA, 5361.Google Scholar
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–57.CrossRefGoogle Scholar
Blanc, H. (1960). Style variations in Spoken Arabic: A sample of interdialectal educated conversation. In Ferguson, C., Contributions to Arabic Linguistics. Cambridge, MA: Harvard University Press, 81161.Google Scholar
Bouamor, H., Habash, N., and Oflazer, K. (2014). A Multidialectal Parallel Corpus of Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’ 14), European Language Resources Association. (ELRA) Reykjavik, Iceland, 1240–5.Google Scholar
Boudchiche, M., Mazroui, A., Ould Bebah, M. O. M., Lakhouaja, A., and Boudlal, A. (2017). AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University – Computer and Information Sciences, 29(2), 141–6.CrossRefGoogle Scholar
Boudelaa, S. and Marslen-Wilson, W. (2010). Aralex: A lexical database for Modern Standard Arabic. Behavior Research Methods, 42(2), 481. https://aralex.mrc-cbu.cam.ac.uk/aralex.online/.Google Scholar
Bougrine, S. Chorana, A., Lakhdari, A., and Cherroun, H. (2017). Toward a web-based speech corpus for Algerian Arabic dialectal varieties. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 138–46.Google Scholar
Buchberger, E. (2009). Book review: Arabic Computational Morphology. Natural Language Engineering, 15, 309–10.Google Scholar
Buckwalter, T. (2007). Issues in Arabic morphological analysis. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 2341.CrossRefGoogle Scholar
Buckwalter, T. and Parkinson, D. (2011). A Frequency Dictionary of Arabic Core Vocabulary for Learners, London: Routledge.Google Scholar
Cahill, L. (2007). A syllable-based account of Arabic morphology. In Soudi, A., Bosch, A., and Neumann, G., eds., Arabic Computational Morphology, Text, Speech and Language Technology, vol. 38. Dordrecht: Springer, 4567.CrossRefGoogle Scholar
Carter, M. G. (2004). Sibawayhi. Oxford: Oxford Centre for Islamic Studies.Google Scholar
Cleary, J. and Witten, I. (1984). Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, COM-32(4), 396402.CrossRefGoogle Scholar
Darwish, K. (2007). Adapting morphology for Arabic information retrieval. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 245–62.Google Scholar
Darwish, K., Mubarak, H., and Abdelali, A. (2017a). Arabic diacritization: Stats, rules, and hacks. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 917.CrossRefGoogle Scholar
Darwish, K., Mubarak, H., Abdelali, A., and Eldesouki, M. (2017b). Arabic POS tagging: Don’t abandon feature engineering just yet. In Habash, N., Diab, M., Darwish, K., et al. eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 130–7.Google Scholar
Diab, M., Al-Badrashiny, M., Aminian, M., Attia, M., Elfardy, H., Habash, N., et al. (2014). Tharwa: A large scale dialectal Arabic–Standard Arabic–English Lexicon. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA) Reykjavik, Iceland, 3782–9.Google Scholar
Diab, M., Hacioglu, K., and Jurafsky, D. (2007). Automatic processing of Modern Arabic text. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 159–79.Google Scholar
Dichy, J. (2002). L’enseignement de l’arabe, langue pluriglossie que dans la France d’aujourd’hui. In Bistolfi, R. and Giordan, A., eds., Les langues de la méditerranée, volume des Cahiers de Confluences Méditerranée. Paris: l’Harmattan, 313–29.Google Scholar
Dichy, J. (2017). Polyglossie de l’Arabe et subsidiarité: au-delà des confusions entraînées par la naotion de diglossie. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 123.Google Scholar
Dichy, J. and Farghaly, A. (2007). Grammar–lexis relations in the computational morphology of Arabic. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 115–40.Google Scholar
Ditters, E. (2013). Issues in Arabic computational linguistics. In Owens, J., ed., The Oxford Handbook of Arabic Linguistics. Oxford: Oxford University Press, 213–40.Google Scholar
Eddakrouri, A. (2018). Al-mudāwwanāt al-luġawiyyyat wa dawruha fi mucālajat an-nuṣūṣ al-ʿarabiyya [Arabic Corpora and Their Role in the Analysis of Arabic Texts]. Riyadh: King Abdullah bin Abdulaziz International Center for the Arabic Language.Google Scholar
El-Kah, A., Zeroual, I., and Lakhouaja, A. (2017). Application of Arabic language processing in language learning. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, New York: Association for Computing Machinery. http://dx.doi.org/10.1145/3090354.3090390, 16.Google Scholar
Farghaly, A. (2010). Arabic Computational Linguistics. Stanford, CA: CSLI Publications.Google Scholar
Farghaly, A. and Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing (TALIP), 8(4), Article 14.CrossRefGoogle Scholar
Fasha, M., Obeid, N., and Hammo, B. (2017). A proposed model for extracting information from Arabic-based controlled text domains. In Proceedings of the New Trends in Information Technology, Amman: University of Jordan, 8692.Google Scholar
Fashwan, A. and Alansary, S. (2017). SHAKKIL: An automatic diacritization system for Modern Standard Arabic texts. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, Association for Computational Linguistics, 8493.Google Scholar
Habash, N. and Roth, R. (2009). CATiB: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009, Conference Short Papers, 221–4.CrossRefGoogle Scholar
Habash, N., Zalmout, N., Taji, D., Hoang, H., and Alzate, M. (2017). A parallel corpus for evaluating machine translation between Arabic and European languages. In Lapata, M., Blunsom, P., and Koller, A., eds., Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, Short Papers, Valencia: Association for Computational Linguistics, 235–41.Google Scholar
Hajič, J., Hajivcová, E., Pajas, P., Panevová, J., Sgall, P., and Hladka, B. (2001). Prague Dependency Treebank 1.0. www.researchgate.net/publication/307174711_Prague_Dependency_Treebank_10.Google Scholar
Hinds, M. and Badawi, E. (2009). A Dictionary of Egyptian Arabic, Arabic–English. Beirut: Librairie du Liban.Google Scholar
Holes, C. (2013). Orality, culture and language. In Owens, J., ed., The Oxford Handbook of Arabic Linguistics. Oxford: Oxford University Press, 281–99.Google Scholar
Hoogland, J. (2003). Woordenboek Arabisch–Nederlands [Arabic–Dutch Dictionary]. Amsterdam: Dutch Language Union – Bulaaq.Google Scholar
Ibrahimi, K. (2017). L’arabe standard, une langue en quête de reconnaissance et de promotion. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 2531.Google Scholar
Jarrar, M., Habash, N., Alrimawi, F., Akra, D., and Zalmout, N. (2017). Curras: An annotated corpus for the Palestinian Arabic dialect. Language Resources and Evaluation, 51(3), 745–75.CrossRefGoogle Scholar
Kazimirski, A. (1860). Dictionnaire Arabe–Français. Beyrouth: Librairie du Liban, 2 vols.Google Scholar
Khalifa, S., Hassan, S., and Habash, N. (2017). A morphological analyzer for Gulf Arabic verbs. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 3544.Google Scholar
Koplenig, A. (2017). Against statistical significance testing in corpus linguistics. Corpus Linguistics and Linguistic Theory, 15(2). doi: 10.1515/cllt-2016–0036.Google Scholar
Köprü, S. and Miller, J. (2009). A unification-based approach to the morphological analysis and generation of Arabic. In Proceedings of the 3rd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL3).Google Scholar
Larkey, L. S., Ballesteros, L., and Connell, M. E. (2007). Light stemming for Arabic information retrieval. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 221–43.Google Scholar
Leech, G. (2007). New resources, or just better old ones? The Holy Grail of representativeness. In Hundt, M., Nesselhauf, N., and Biewer, C., eds., Corpus Linguistics and the Web. Amsterdam: Rodopi, 133–49.Google Scholar
Lelubre, X. (2017). Variations regionals et communication scientifique en arabe. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 5979.Google Scholar
Maamouri, M. and Bies, A. (2009). Penn Arabic Treebank Guidelines version 4.92. Tech. report, University of Pennsylvania.Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., and Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools.Google Scholar
McCarthy, J. (1981). A prosodic theory of nonconcatenative morphology. Linguistic Inquiry 12, 373418.Google Scholar
McEnery, T. Xiao, R., and Tono, Y. (2006). Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge.Google Scholar
Mdhaffar, S. (2017). Sentiment analysis of Tunisian dialect: Linguistic resources and experiments. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 5561.Google Scholar
Menacer, M., Mella, O., Fohr, D., Jouvet, D., Langlois, D., and Smaili, K. (2017). An enhanced automatic speech recognition system for Arabic. Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, 157–65.Google Scholar
Mohamed, E., Mohit, B., and Oflazer, K. (2012). Annotating and learning morphological segmentation of Egyptian colloquial Arabic. In Proceedings of International Conference on Language Resources and Evaluation, 873–7.Google Scholar
Muhammed, R., Farrag, M., Elshamly, N., and Abdel-Ghaffar, N. (2011). Summary of Arabizi or Romanization: The dilemma of writing texts. in Proceedings of Jil Jaded Conference, University of Texas at Austin, 18–19 February (2011).Google Scholar
Nagoudi, E. and Schwab, D. (2017). Semantic similarity of Arabic sentences with word embeddings. In N. Habash, M. Diab, K. Darwish et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, 18–24.Google Scholar
Parkinson, D. (2001). Future variability: A corpus study of Arabic future particles. In Parkinson, D. and Farwaneh, S., eds., Perspectives on Arabic Linguistics XV. Amsterdam: Benjamins, 191211.Google Scholar
Pasha, A., Al-Badrashiny, M., El Kholy, A., Eskander, R., Diab, M., Habash, N., et al. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Reykjavik, Iceland.Google Scholar
Pinon, C. (2017). Intégrer les variations dans l’enseignement de l’arabe langue étrangère: enjeux et méthodes. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 123.Google Scholar
Ryding, K. (2005). A Reference Grammar of Modern Standard Arabic. Cambridge: Cambridge University Press.Google Scholar
Saleh, M. (2012). Al-ḥāsūb wa-l bahth al luġawiyy (al mudawannāt alluġawiyyat namūdajan) [The Computer and Linguistic Research (Corpora as a Model)]. Jaamiʾat al-Malik Sauud, Riyadh, 79.Google Scholar
Samih, Y., Attia, M., Eldesouki, M., Mubarak, H., Abdelali, A., Kallmeyer, L., et al. (2017). A neural architecture for dialectal Arabic segmentation. In Habash, N., Diab, M., Darwish, K. et al. eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 4654.Google Scholar
Schultz, T. and Schlippe, T. (2014). GlobalPhone: Pronunciation dictionaries in 20 languages. In Calzolari, N., Choukri, K., and Declerck, T. et al., eds., Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik: European Languages Resources Association, 337–41.Google Scholar
Sforza, V. and Soudi, A. (2007). Arabic computational morphology: A trade-off between multiple operations and multiple stems. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 89114.CrossRefGoogle Scholar
Soliman, A., Eissa, K., and El-Beltagy, S. A. (2017). Aravec: A set of Arabic word embedding models for use in Arabic. Procedia Computer Science, 117, 256–65.CrossRefGoogle Scholar
Soudi, A., van den Bosch, A., and Neumann, G. (2007). Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer.Google Scholar
Taji, D., Habash, N., and Zeman, D. (2017). Universal dependencies for Arabic. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 166–76.Google Scholar
Tratz, S. (2016). Arabic Dependency Treebank. ARL, US Army Research Laboratory, https://catalog.ldc.upenn.edu/docs/LDC2016T18/ARL-TN-0735.pdf.Google Scholar
Van den Bosch, A., Marsi, E., and Soudi, A. (2007). Memory-based morphological analysis and part-of-speech tagging of Arabic. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 201–17.Google Scholar
Van Mol, M. (1998). Variatie in Modern Standaard Arabisch in radionieuwsbulletins, Een synchronisch descriptief onderzoek naar het gebruik van complementaire partikels. PhD dissertation, University of Leuven.Google Scholar
Van Mol, M. (2000). Arabic language and vocabulary acquisition. MIDEO, 24, 434–40.Google Scholar
Van Mol, M. (2001). Evolution of MSA: The case of some complementary particles. In Parkinson, D. and Farwaneh, S., eds., Perspectives on Arabic Linguistics XV. Amsterdam: Benjamins, 135–47.Google Scholar
Van Mol, M. (2003). Variation in Modern Standard Arabic in Radio News Broadcasts, A Synchronic Descriptive Investigation in the Use of Complementary Particles, Orientalia Lovaniensia Analecta, 117. Leuven: Peeters.Google Scholar
Van Mol, M. (2005). From lexical database to tagged Arabic corpus. Paper Presented at the ACIDA/ICMI Conference, Tozeur, 56 November. https://ilt.kuleuven.be/arabic/pdf/Mark%20Van%20Mol%20A031.pdf; last accessed 11 December 2020.Google Scholar
Van Mol, M. (2010). Arabic oral media and corpus linguistics: A first methodological outline. In Bassiouni, R., ed., Arabic and the Media: Linguistic Analyses and Applications. Leiden: Brill, 6379.Google Scholar
Van Mol, M. (2012). From paper dictionary to an elaborate electronic lexicographical database. In Vatvedt, R. and Torjusen, J. M., eds., Proceedings of the 15th EURALEX International Congress,711 August (2012). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 758–63.Google Scholar
Van Mol, M. (2014). تطوير متكامل إلكتروني لتدريس اللغة العربية لللناطقين بغيرها [The development of an all compassing electronic device for L2 Arabic learners] In Al-Qahtani, A. et al., eds., أعمال مؤتمر :اتجاهات حديثة في تعليم لغة ثانية [Proceedings of the Current Tendencies in the Teaching of Arabic as L2 Language Conference]. Ryadh: Dār Jāmicat al-Malik Sacūd lil-Nashr, 219–55.Google Scholar
Van Mol, M. (2017a). La langue arabe et la definition de ses différents niveaux de langue. Éxigences, possibilités et limitations d’une analyse numérique sur base de corpus représentatifs. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 346.Google Scholar
Van Mol, M. (2017b). Arabic language teaching and the real linguistic situation: What does linguistic empirical research teach us about Arabic language levels. In Shigeki, K., ed., Proceedings of the 8th Congress of Arabic Linguistics (2015). Kyoto: Tokyo University of Foreign Studies, 331–51.Google Scholar
Van Mol, M. and Berghman, K. (2001a). Leerwoordenboek Modern Arabisch– Nederlands, (Learners Dictionary Modern Arabic–Dutch). Amsterdam: The Dutch Language Union, Bulaaq.Google Scholar
Van Mol, M. and Berghman, K. (2001b). Leerwoordenboek Nederlands – Modern Arabisch (Learners Dictionary Dutch–Modern Arabic). Amsterdam: The Dutch Language Union, Bulaaq.Google Scholar
Wehr, H. (1994). Arabic–English Dictionary, 4th ed. Urbana, IL: Spoken Language Services.Google Scholar
Whitcomb, L. and Alansary, S. (2018). Using linguistic corpora in Arabic Foreign Language Teaching. In Wahba, K., England, L., and Taha, Z. A., eds., Handbook for Arabic Language Teaching Professionals in the 21st Century, vol. II. New York: Routledge, 219–31.Google Scholar
Yaghan, M. A. (2008). Arabizi: A contemporary style of Arabic slang. Design Issues, 24, 3952.Google Scholar
Yassen, K., Sawalha, M., and Al Zaghoul, F. (2017). Part-of-speech tagging for Classical and MSA text using NLTK. In Proceedings of the New Trends in Information Technology. Amman: University of Jordan, 106–12.Google Scholar
Yaʾqub, I. (1988). Mawsuʿat al-ḥurūf [Thesaurus]. Beirut: Dar al Jayl.Google Scholar
Zaghouani, W. (2014). Critical survey of the freely available Arabic corpora. In Calzolari, N., Choukri, K., and Declerck, T. et al., eds., Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik: European Languages Resources Association, 18.Google Scholar
Zahran, M. A., Magooda, A., Mahgoub, A. Y., Raafat, H., Rashwan, M., and Atyia, A. (2015). Word representations in vector space and their applications for Arabic. In Gelbukh, A., ed., International Conference on Intelligent Text Processing and Computational Linguistics. Dordrecht: Springer, 430–43.Google Scholar
Zeroual, I., Lakhoaga, A., and Belhabib, R. (2017). Towards a standard part of speech tagset for the Arabic language. Journal of King Saud University – Computer and Information Sciences, 29(2), 171–8.Google Scholar

Corpora and Web Resources

Broad Operational Language Translation (BOLT) program: https://catalog.ldc.upenn.edu/LDC2017T07.Google Scholar
LDC (Linguistic Data Consortium) www.ldc.upenn.edu/.Google Scholar
Prague Arabic Dependency Treebank (PADT) 1.0: https://catalog.ldc.upenn.edu/LDC2004T23.Google Scholar
Quamus Arabic Lexicography: Buckwalter T (2002). www.qamus.org/.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×