Arabic Computational and Corpus Linguistics

Part IV - Arabic Computational and Corpus Linguistics

Published online by Cambridge University Press: 23 September 2021

Edited by

Karin Ryding and

David Wilmsen

Show author details

Karin Ryding: Affiliation:
Georgetown University, Washington DC
David Wilmsen: Affiliation:
American University of Beirut

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: The Cambridge Handbook of Arabic Linguistics , pp. 425 - 504

DOI: https://doi.org/10.1017/9781108277327 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdelali, A., Darwish, K., Durrani, N., and Mubarak, H. (2016). Farasa: A fast and furious segmenter for Arabic. In Proceedings of the Meeting of the North America Association for Computational Linguistics (NAACL). San Diego, California.Google Scholar

Abdul-Mageed, M. and Diab, M. (2012). Toward building a large-scale Arabic sentiment lexicon. In Proceedings of The International Global WordNet Conference. Matsue, Japan.Google Scholar

Abdul-Mageed, M., Kuebler, S., and Diab, M. (2012). SAMAR: A system for subjectivity and sentiment analysis of Arabic social media. In Proceedings of the Workshop in Computational Approaches to Subjectivity and Sentiment Analysis. Jeju, Korea.Google Scholar

Al-Badrashiny, M., Eskander, R., Habash, N., and Rambow, O. (2014). Automatic transliteration of Romanized dialectal Arabic. In Proceedings of the Conference on Computational Natural Language Learning. Ann Arbor, Michigan.Google Scholar

Al Sallab, A. A., Baly, R., Badaro, G., Hajj, H., El Hajj, W., and Shaban, K. B. (2015). Deep learning models for sentiment analysis in Arabic. In Proceedings of the Arabic Natural Language Processing Workshop (WANLP). Beijing, China.Google Scholar

Badaro, G., Baly, R., Hajj, H., Habash, N., and El Hajj, W. (2014). A large scale Arabic sentiment lexicon for Arabic opinion mining. In Proceedings of the EMNLP 2014 Workshop on Arabic Natural Language Processing (ANLP). Doha, Qatar, 165–73.Google Scholar

Bouamor, H., Habash, N., and Oflazer, K. (2014). A multidialectal parallel corpus of Arabic. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland.Google Scholar

Boudchiche, M., Mazroui, A., Bebah, M. O. A. O., Lakhouaja, A., and Boudlal, A. (2017). AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University – Computer and Information Sciences, 29(2), 141–6.CrossRef Google Scholar

Chiang, D., Diab, M., Habash, N., Rambow, O., and Shareef, S. (2006). Parsing Arabic dialects. In Proceedings of the Meeting of the European Association for Computational Linguistics (EACL). Trento, Italy.Google Scholar

Diab, M. (2007). Improved Arabic base phrase chunking with a new enriched POS tag set. In Proceedings of the Workshop on Computational Approaches to Semitic Languages (CASL). Prague, Czech Republic.Google Scholar

Dukes, K., and Buckwalter, T. (2010). A dependency treebank of the Quran using traditional Arabic grammar. In Proceedings of the International Conference on Informatics and Systems (INFOS). Cairo, Egypt.Google Scholar

Dukes, K., Atwell, E., and Habash, N. (2013). Supervised collaboration for syntactic annotation of Quranic Arabic. In Language Resources and Evaluation, 47(1), 33–62.Google Scholar

El Kholy, A. and Habash, N. (2012). Orthographic and morphological processing for English–Arabic statistical machine translation. Machine Translation, 26(1–2), 25–45.Google Scholar

El Kholy, A. and Habash, N. (2015). Morphological constraints for phrase pivot statistical machine translation. In Proceedings of the Machine Translation Summit (MTSummit). Miami, Florida.Google Scholar

Elfardy, H. and Diab, M. (2013). Sentence-level dialect identification in Arabic. In Proceedings of the Association for Computational Linguistics. Sofia, Bulgaria.Google Scholar

Elkateb, S., Black, W., Rodriguez, H., Alkhalifa, M., Vossen, P., Pease, A., et al. (2006). Building a WordNet for Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Genoa, Italy.Google Scholar

Eskander, R., Habash, N., Rambow, O., and Tomeh, N. (2013). Processing spontaneous orthography. In Proceedings of the North American Chapter of the Association for Computational Linguistics. Atlanta, Georgia.Google Scholar

Eskander, R., Habash, N., Rambow, O., and Pasha, A. (2016). Creating resources for dialectal Arabic from a single annotation: A case study on Egyptian and Levantine. In Proceedings of the International Conference on Computational Linguistic (COLING). Osaka, Japan.Google Scholar

Fellbaum, C. (ed.) (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.Google Scholar

Graff, D., Maamouri, M., Bouziri, B., Krouna, S., Kulick, S., and Buckwalter, T. (2009). Standard Arabic Morphological Analyzer – Version 3.1 Catalog No.: LDC2009E73. Linguistic Data Consortium, University of Pennsylvania.Google Scholar

Green, S. and Manning, C. D. (2010). Better Arabic parsing: Baselines, evaluations, and analysis. In Proceedings of the 23rd International Conference on Computational Linguistics. Beijing, China, 394–402.Google Scholar

Guzmán, F., Bouamor, H., Baly, R., and Habash, N. (2016). Machine translation evaluation for Arabic using morphologically-enriched embeddings. In Proceedings of COLING 2106. Osaka, Japan.Google Scholar

Habash, N. (2010). Introduction to Arabic Natural Language Processing, vol. 3. Morgan & Claypool.Google Scholar

Habash, N. and Roth, R. (2009). CATiB: The Columbia Arabic Treebank. In Proceedings of the ACL-JNLP Conference. Suntec, Singapore, 221–4.Google Scholar

Habash, N. and Sadat, F. (2006). Arabic preprocessing schemes for statistical machine translation. In Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL). New York.Google Scholar

Habash, N., Soudi, A., and Buckwalter, T. (2007). On Arabic transliteration. In Soudi, A., Neumann, G., and van den Bosch, A., eds., Arabic Computational Morphology: Text, Speech and Language Technology, vol. 38. Dordrecht: Springer, 15–22.Google Scholar

Habash, N., Eskander, R., and Hawwari, A. (2012a). A morphological analyzer for Egyptian Arabic. In Proceedings of the Workshop on Computational Morphology and Phonology. Montréal, Canada.Google Scholar

Habash, N., Diab, M., and Rambow, O. (2012b). Conventional orthography for dialectal Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Istanbul, Turkey.Google Scholar

Habash, N., Zalmout, N., Taji, D., Hoang, H., and Alzate, M. (2017). A parallel corpus for evaluating machine translation between Arabic and European languages. In Proceedings of the European Chapter of the Association for Computational Linguistics. Valencia, Spain.Google Scholar

Hirst, G. (ed.) (2008–2017). Synthesis Lectures on Human Language Technologies. Morgan & Claypool.Google Scholar

Hovy, D. (2012). Programming in Python for Linguists: A Gentle Introduction. www.dirkhovy.com/portfolio/papers/download/pfl_handout.pdf; last accessed 10 December 2020.Google Scholar

Jarrar, M., Habash, N., Alrimawi, F., Akra, D., and Zalmout, N. (2017). Curras: An annotated corpus for the Palestinian Arabic dialect. Language Resources and Evaluation, 51, 745–75.Google Scholar

Jinxi, X. (2002). UN Parallel Text (Arabic-English), LDC Catalog No.: LDC2002E15. Linguistic Data Consortium, University of Pennsylvania.Google Scholar

Jurafsky, D. and Martin, J. H. (2008). Speech and Language Processing. Upper Saddle River, NJ: Prentice Hall.Google Scholar

Khalifa, S., Habash, N., Abdulrahim, D., and Hassan, S. (2016). A large scale corpus of Gulf Arabic. In Proceedings of the Language Resources and Evaluation Conference 2016. Portorož, Slovenia.Google Scholar

Khalifa, S., Hassan, S., and Habash, N. (2017). A morphological analyzer for Gulf Arabic verbs. In Proceedings of the Third Arabic Natural Language Processing Workshop (WANLP). Valencia, Spain, 35–45.Google Scholar

Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In Proceedings of the 10th Machine Translation Summit. Phuket, Thailand. 79–86.Google Scholar

Maamouri, M., Bies, A., Buckwalter, T., and Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools. Cairo, Egypt.Google Scholar

Maamouri, M., Bies, A., Kulick, S., Ciul, M., Habash, N., and Eskander, R. (2014). Developing an Egyptian Arabic Treebank: Impact of dialectal morphology on annotation and tool development. In Proceedings of the International Conference on Language Resources and Evaluation (LREC). Reykjavik, Iceland.Google Scholar

Manning, C. and Schutze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press.Google Scholar

Marton, Y., Habash, N., and Rambow, O. (2013). Dependency parsing of Modern Standard Arabic with lexical and inflectional features. Computational Linguistics, 39(1), 161–94.Google Scholar

Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar

Mohit, B., Rozovskaya, A., Habash, N., Zaghouani, W., and Obeid, O. (2014). The first QALB shared task on automatic text correction for Arabic. In Proceedings of the Arabic Natural Language Processing Workshop (WANLP). Doha, Qatar.Google Scholar

Munteanu, D. S. and Marcu, D. (2007). ISI Arabic–English Automatically Extracted Parallel Tex. Catalog No.: LDC2007T08. Linguistic Data Consortium, University of Pennsylvania.Google Scholar

Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., et al. (2016). Universal Dependencies v1: A Multilingual Treebank Collection. In Proceedings of International Conference on Language Resources and Evaluation. Portorož, Slovenia.Google Scholar

Pasha, A., Al-Badrashiny, M., El Kholy, A., Eskander, R., Diab, M., Habash, N., et al. (2014). MADAMIRA: A fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In Proceedings of the International Conference on Language Resources and Evaluation. Reykjavik, Iceland.Google Scholar

Rafalovitch, A. and Dale, R. (2009). United Nations General Assembly Resolutions: A six-language parallel corpus. In Proceedings of the 12th Machine Translation Summit. Ottawa, Canada.Google Scholar

Salloum, W. and Habash, N. (2011). Dialectal to Standard Arabic paraphrasing to improve Arabic–English statistical machine translation. In Proceedings of the Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties. Edinburgh, UK.Google Scholar

Shahrour, A., Khalifa, S., Taji, D., and Habash, N. (2016). CamelParser: A system for Arabic syntactic analysis and morphological disambiguation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics. Osaka, Japan, 228–32.Google Scholar

Shoufan, A. and Alameri, S. (2015). Natural language processing for dialectical Arabic: A survey. In Proceedings of the Second Workshop on Arabic Natural Language Processing. Beijing, China, 36–48.Google Scholar

Smrž, O. (2007). ElixirFM: Implementation of functional Arabic morphology. In Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources. Prague, Czech Republic, 1–8.Google Scholar

Smrž, O., Bielický, V., Kouřilová, I., Kráčmar, J., Hajič, J., and Zemánek, P. (2008). Prague Arabic Dependency Treebank: A word on the million words. In Proceedings of the International Conference on Language Resources and Evaluation. Marrakech, Morocco.Google Scholar

Taji, D., Habash, N., and Zeman, D. (2017). Universal dependencies for Arabic. In Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, Spain, 166–76.Google Scholar

Tounsi, L., Attia, M., and van Genabith, J. (2009). Automatic treebank-based acquisition of Arabic LFG dependency structures. In Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages. Athens, Greece, 45–52.CrossRef Google Scholar

Watson, J. C. E. (2007). The Phonology and Morphology of Arabic. Oxford: Oxford University Press.Google Scholar

Zaghouani, W. (2014). Critical survey of the freely available Arabic corpora. In Proceedings of the Workshop on Free/Open-Source Arabic Corpora and Corpora Processing Tools. Reykjavik, Iceland.Google Scholar

Zaghouani, W., Diab, M., Mansouri, A., Pradhan, S., and Palmer, M. (2010). The Revised Arabic Propbank. In Proceedings of the Linguistic Annotation Workshop. Uppsala, Sweden.Google Scholar

Zaghouani, W., Mohit, B., Habash, N., Obeid, O., Tomeh, N., Rozovskaya, A., et al. (2014). Large-scale Arabic error annotation: Guidelines and framework. In Proceedings of the International Conference on Language Resources and Evaluation . Reykjavik, Iceland.Google Scholar

Zalmout, N. and Habash, N. (2017). Don’t throw those morphological analyzers away just yet: Neural morphological disambiguation for Arabic. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark.Google Scholar

Zbib, R., Malchiodi, E., Devlin, J., Stallard, D., Matsoukas, S., Schwartz, R., et al. (2012). Machine translation of Arabic dialects. In Proceedings of the North American Chapter of the Association for Computational Linguistics. Montréal, Canada.Google Scholar

References

Abdelnour, J. (1983). Dictionnaire Arabe–Français. Bayreuth: Dar el-Ilm lil-Malayin.Google Scholar

Abouenour, L., Bouzoubaa, K., and Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation 47, 891–917.CrossRef Google Scholar

Ad-Dahdah, A. (1990). Muʿjam qawāʿid al-ʿarabiyya al-ʿālamiyya [A Dictionary of Universal Arabic Grammar]. Beirut: Maktabat Lubnan.Google Scholar

Adouane, W. and Dobnik, S. (2017). Identification of languages in Algerian Arabic multilingual documents. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 1–8.Google Scholar

Al-Badrashiny, M. (2017). Layered language model based hybrid approach to automatic full diacritization of Arabic. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 177–84.Google Scholar

Alfaifi, A. (2015). Building the Arabic Learner Corpus and a System for Arabic Error Annotation. PhD thesis, University of Leeds, School of Computing.Google Scholar

Alhawiti, K. (2014). Adaptive Models of Arabic Text. PhD dissertation, Bangor University, Wales, UK.Google Scholar

Alkhazi, I. (2017). Classifying and segmenting Classical and Modern Standard Arabic using minimum cross-entropy. International Journal of Advanced Computer Science and Applications, 8(4), 421–30.Google Scholar

Al-Marwani, N. and Diab, M. (2017). Arabic textual entailment with word embeddings. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia: Association for Computational Linguistics, 177–84.Google Scholar

Almujaiwel, S. (2017). Discursive patterns of anti-feminism and pro-feminism in Arabic newspapers of the KACST corpus. Discourse & Communication, 11(5), 441–66.CrossRef Google Scholar

Al-Najem, T. (2007). Inheritance-based approach to Arabic verbal root-and-pattern morphology. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 67–87.Google Scholar

Alosaimy, A. and Atwell, E. (2017). Tagging Classical Arabic text using available morphological analysers and part of speech taggers. Journal for Language Technology and Computational Linguistics, 32(1), 1–26.Google Scholar

Alqassas, A. (2017). Arabic diglossia and heritage language acquisition: Remarks on acquisition planning. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 81–97.Google Scholar

Al-Sayed, A., Hammo, B., and Yagi, S. (2017). Construction of an English–Arabic political parallel corpus. in Proceedings of the New Trends in Information Technology (NTIT-2017). Amman: The University of Jordan.Google Scholar

Al-Shargi, F. and Rambow, O. (2015). DIWAN: A dialectal word annotation tool for Arabic. In Habash, N., Vogel, S., and Darwish, K., eds., Proceedings of the Second Workshop on Arabic Natural Language Processing. Beijing: Association for Computational Linguistics, 49–58.Google Scholar

Alshutayri, A. and Atwell, E. (2017). Exploring Twitter as a source of an Arabic dialect corpus. International Journal of Computational Linguistics, 8(2), 37–44.Google Scholar

Al-Thubaity, A. and Almujaiwel, S. (2017). A quantitative inquiry into the keywords between primary and reference Arabic corpora. Journal of Quantitative Linguistics 25(2), 121–41. DOI: 10.1080/09296174.2017.1359883, 1–20.Google Scholar

Badawi, E., Carter, M. G., and Gully, A. (2003). Modern Written Arabic: A Comprehensive Grammar. London: Routledge.Google Scholar

Bernardi, F., Chakhaia, L., and Leopold, L. (2017). ‘Sing me a song with social significance’: The (mis)use of statistical significance testing in European sociological research. European Sociological Review, 33(1), 1–15.Google Scholar

Biadsy, F., Hirschberg, J., and Habash, N. (2009). Spoken Arabic dialect identification using phonotactic modeling. In Rosner, M. and Shuly, W., eds., Proceedings of the EACL Workshop on Computational Approaches to Semitic Languages, Athens, ACL, Stroudsburg, PA, USA, 53–61.Google Scholar

Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–57.Google Scholar

Blanc, H. (1960). Style variations in Spoken Arabic: A sample of interdialectal educated conversation. In Ferguson, C., Contributions to Arabic Linguistics. Cambridge, MA: Harvard University Press, 81–161.Google Scholar

Bouamor, H., Habash, N., and Oflazer, K. (2014). A Multidialectal Parallel Corpus of Arabic. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’ 14), European Language Resources Association. (ELRA) Reykjavik, Iceland, 1240–5.Google Scholar

Boudchiche, M., Mazroui, A., Ould Bebah, M. O. M., Lakhouaja, A., and Boudlal, A. (2017). AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer. Journal of King Saud University – Computer and Information Sciences, 29(2), 141–6.Google Scholar

Boudelaa, S. and Marslen-Wilson, W. (2010). Aralex: A lexical database for Modern Standard Arabic. Behavior Research Methods, 42(2), 481. https://aralex.mrc-cbu.cam.ac.uk/aralex.online/.Google Scholar

Bougrine, S. Chorana, A., Lakhdari, A., and Cherroun, H. (2017). Toward a web-based speech corpus for Algerian Arabic dialectal varieties. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia: Association for Computational Linguistics, 138–46.Google Scholar

Buchberger, E. (2009). Book review: Arabic Computational Morphology. Natural Language Engineering, 15, 309–10.Google Scholar

Buckwalter, T. (2007). Issues in Arabic morphological analysis. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 23–41.Google Scholar

Buckwalter, T. and Parkinson, D. (2011). A Frequency Dictionary of Arabic Core Vocabulary for Learners, London: Routledge.Google Scholar

Cahill, L. (2007). A syllable-based account of Arabic morphology. In Soudi, A., Bosch, A., and Neumann, G., eds., Arabic Computational Morphology, Text, Speech and Language Technology, vol. 38. Dordrecht: Springer, 45–67.Google Scholar

Carter, M. G. (2004). Sibawayhi. Oxford: Oxford Centre for Islamic Studies.Google Scholar

Cleary, J. and Witten, I. (1984). Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications, COM-32(4), 396–402.Google Scholar

Darwish, K. (2007). Adapting morphology for Arabic information retrieval. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 245–62.Google Scholar

Darwish, K., Mubarak, H., and Abdelali, A. (2017a). Arabic diacritization: Stats, rules, and hacks. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 9–17.Google Scholar

Darwish, K., Mubarak, H., Abdelali, A., and Eldesouki, M. (2017b). Arabic POS tagging: Don’t abandon feature engineering just yet. In Habash, N., Diab, M., Darwish, K., et al. eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 130–7.Google Scholar

Diab, M., Al-Badrashiny, M., Aminian, M., Attia, M., Elfardy, H., Habash, N., et al. (2014). Tharwa: A large scale dialectal Arabic–Standard Arabic–English Lexicon. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), European Language Resources Association (ELRA) Reykjavik, Iceland, 3782–9.Google Scholar

Diab, M., Hacioglu, K., and Jurafsky, D. (2007). Automatic processing of Modern Arabic text. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 159–79.Google Scholar

Dichy, J. (2002). L’enseignement de l’arabe, langue pluriglossie que dans la France d’aujourd’hui. In Bistolfi, R. and Giordan, A., eds., Les langues de la méditerranée, volume des Cahiers de Confluences Méditerranée. Paris: l’Harmattan, 313–29.Google Scholar

Dichy, J. (2017). Polyglossie de l’Arabe et subsidiarité: au-delà des confusions entraînées par la naotion de diglossie. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 1–23.Google Scholar

Dichy, J. and Farghaly, A. (2007). Grammar–lexis relations in the computational morphology of Arabic. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 115–40.Google Scholar

Ditters, E. (2013). Issues in Arabic computational linguistics. In Owens, J., ed., The Oxford Handbook of Arabic Linguistics. Oxford: Oxford University Press, 213–40.Google Scholar

Eddakrouri, A. (2018). Al-mudāwwanāt al-luġawiyyyat wa dawruha fi mu^cālajat an-nuṣūṣ al-ʿarabiyya [Arabic Corpora and Their Role in the Analysis of Arabic Texts]. Riyadh: King Abdullah bin Abdulaziz International Center for the Arabic Language.Google Scholar

El-Kah, A., Zeroual, I., and Lakhouaja, A. (2017). Application of Arabic language processing in language learning. In Proceedings of the 2nd International Conference on Big Data, Cloud and Applications, New York: Association for Computing Machinery. http://dx.doi.org/10.1145/3090354.3090390, 1–6.Google Scholar

Farghaly, A. (2010). Arabic Computational Linguistics. Stanford, CA: CSLI Publications.Google Scholar

Farghaly, A. and Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing (TALIP), 8(4), Article 14.Google Scholar

Fasha, M., Obeid, N., and Hammo, B. (2017). A proposed model for extracting information from Arabic-based controlled text domains. In Proceedings of the New Trends in Information Technology, Amman: University of Jordan, 86–92.Google Scholar

Fashwan, A. and Alansary, S. (2017). SHAKKIL: An automatic diacritization system for Modern Standard Arabic texts. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, Association for Computational Linguistics, 84–93.Google Scholar

Habash, N. and Roth, R. (2009). CATiB: The Columbia Arabic Treebank. In Proceedings of the ACL-IJCNLP 2009, Conference Short Papers, 221–4.CrossRef Google Scholar

Habash, N., Zalmout, N., Taji, D., Hoang, H., and Alzate, M. (2017). A parallel corpus for evaluating machine translation between Arabic and European languages. In Lapata, M., Blunsom, P., and Koller, A., eds., Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, Short Papers, Valencia: Association for Computational Linguistics, 235–41.Google Scholar

Hajič, J., Hajivcová, E., Pajas, P., Panevová, J., Sgall, P., and Hladka, B. (2001). Prague Dependency Treebank 1.0. www.researchgate.net/publication/307174711_Prague_Dependency_Treebank_10.Google Scholar

Hinds, M. and Badawi, E. (2009). A Dictionary of Egyptian Arabic, Arabic–English. Beirut: Librairie du Liban.Google Scholar

Holes, C. (2013). Orality, culture and language. In Owens, J., ed., The Oxford Handbook of Arabic Linguistics. Oxford: Oxford University Press, 281–99.Google Scholar

Hoogland, J. (2003). Woordenboek Arabisch–Nederlands [Arabic–Dutch Dictionary]. Amsterdam: Dutch Language Union – Bulaaq.Google Scholar

Ibrahimi, K. (2017). L’arabe standard, une langue en quête de reconnaissance et de promotion. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 25–31.Google Scholar

Jarrar, M., Habash, N., Alrimawi, F., Akra, D., and Zalmout, N. (2017). Curras: An annotated corpus for the Palestinian Arabic dialect. Language Resources and Evaluation, 51(3), 745–75.Google Scholar

Kazimirski, A. (1860). Dictionnaire Arabe–Français. Beyrouth: Librairie du Liban, 2 vols.Google Scholar

Khalifa, S., Hassan, S., and Habash, N. (2017). A morphological analyzer for Gulf Arabic verbs. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 35–44.Google Scholar

Koplenig, A. (2017). Against statistical significance testing in corpus linguistics. Corpus Linguistics and Linguistic Theory, 15(2). doi: 10.1515/cllt-2016–0036.Google Scholar

Köprü, S. and Miller, J. (2009). A unification-based approach to the morphological analysis and generation of Arabic. In Proceedings of the 3rd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL3).Google Scholar

Larkey, L. S., Ballesteros, L., and Connell, M. E. (2007). Light stemming for Arabic information retrieval. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 221–43.Google Scholar

Leech, G. (2007). New resources, or just better old ones? The Holy Grail of representativeness. In Hundt, M., Nesselhauf, N., and Biewer, C., eds., Corpus Linguistics and the Web. Amsterdam: Rodopi, 133–49.Google Scholar

Lelubre, X. (2017). Variations regionals et communication scientifique en arabe. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 59–79.Google Scholar

Maamouri, M. and Bies, A. (2009). Penn Arabic Treebank Guidelines version 4.92. Tech. report, University of Pennsylvania.Google Scholar

Maamouri, M., Bies, A., Buckwalter, T., and Mekki, W. (2004). The Penn Arabic Treebank: Building a large-scale annotated Arabic corpus. In Proceedings of the NEMLAR Conference on Arabic Language Resources and Tools.Google Scholar

McCarthy, J. (1981). A prosodic theory of nonconcatenative morphology. Linguistic Inquiry 12, 373–418.Google Scholar

McEnery, T. Xiao, R., and Tono, Y. (2006). Corpus-Based Language Studies: An Advanced Resource Book. London: Routledge.Google Scholar

Mdhaffar, S. (2017). Sentiment analysis of Tunisian dialect: Linguistic resources and experiments. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 55–61.Google Scholar

Menacer, M., Mella, O., Fohr, D., Jouvet, D., Langlois, D., and Smaili, K. (2017). An enhanced automatic speech recognition system for Arabic. Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, 157–65.Google Scholar

Mohamed, E., Mohit, B., and Oflazer, K. (2012). Annotating and learning morphological segmentation of Egyptian colloquial Arabic. In Proceedings of International Conference on Language Resources and Evaluation, 873–7.Google Scholar

Muhammed, R., Farrag, M., Elshamly, N., and Abdel-Ghaffar, N. (2011). Summary of Arabizi or Romanization: The dilemma of writing texts. in Proceedings of Jil Jaded Conference, University of Texas at Austin, 18–19 February (2011).Google Scholar

Nagoudi, E. and Schwab, D. (2017). Semantic similarity of Arabic sentences with word embeddings. In N. Habash, M. Diab, K. Darwish et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop. Valencia, 18–24.Google Scholar

Parkinson, D. (2001). Future variability: A corpus study of Arabic future particles. In Parkinson, D. and Farwaneh, S., eds., Perspectives on Arabic Linguistics XV. Amsterdam: Benjamins, 191–211.Google Scholar

Pinon, C. (2017). Intégrer les variations dans l’enseignement de l’arabe langue étrangère: enjeux et méthodes. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 1–23.Google Scholar

Ryding, K. (2005). A Reference Grammar of Modern Standard Arabic. Cambridge: Cambridge University Press.Google Scholar

Saleh, M. (2012). Al-ḥāsūb wa-l bahth al luġawiyy (al mudawannāt alluġawiyyat namūdajan) [The Computer and Linguistic Research (Corpora as a Model)]. Jaamiʾat al-Malik Sauud, Riyadh, 79.Google Scholar

Samih, Y., Attia, M., Eldesouki, M., Mubarak, H., Abdelali, A., Kallmeyer, L., et al. (2017). A neural architecture for dialectal Arabic segmentation. In Habash, N., Diab, M., Darwish, K. et al. eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 46–54.Google Scholar

Schultz, T. and Schlippe, T. (2014). GlobalPhone: Pronunciation dictionaries in 20 languages. In Calzolari, N., Choukri, K., and Declerck, T. et al., eds., Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik: European Languages Resources Association, 337–41.Google Scholar

Sforza, V. and Soudi, A. (2007). Arabic computational morphology: A trade-off between multiple operations and multiple stems. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology. Knowledge-Based and Empirical Methods. Dordrecht: Springer, 89–114.Google Scholar

Soliman, A., Eissa, K., and El-Beltagy, S. A. (2017). Aravec: A set of Arabic word embedding models for use in Arabic. Procedia Computer Science, 117, 256–65.Google Scholar

Soudi, A., van den Bosch, A., and Neumann, G. (2007). Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer.Google Scholar

Taji, D., Habash, N., and Zeman, D. (2017). Universal dependencies for Arabic. In Habash, N., Diab, M., Darwish, K. et al., eds., Proceedings of the Third Arabic Natural Language Processing Workshop, Valencia, 166–76.Google Scholar

Tratz, S. (2016). Arabic Dependency Treebank. ARL, US Army Research Laboratory, https://catalog.ldc.upenn.edu/docs/LDC2016T18/ARL-TN-0735.pdf.Google Scholar

Van den Bosch, A., Marsi, E., and Soudi, A. (2007). Memory-based morphological analysis and part-of-speech tagging of Arabic. In Soudi, A., van den Bosch, A., and Neumann, G., eds., Arabic Computational Morphology: Knowledge-Based and Empirical Methods. Dordrecht: Springer, 201–17.Google Scholar

Van Mol, M. (1998). Variatie in Modern Standaard Arabisch in radionieuwsbulletins, Een synchronisch descriptief onderzoek naar het gebruik van complementaire partikels. PhD dissertation, University of Leuven.Google Scholar

Van Mol, M. (2000). Arabic language and vocabulary acquisition. MIDEO, 24, 434–40.Google Scholar

Van Mol, M. (2001). Evolution of MSA: The case of some complementary particles. In Parkinson, D. and Farwaneh, S., eds., Perspectives on Arabic Linguistics XV. Amsterdam: Benjamins, 135–47.Google Scholar

Van Mol, M. (2003). Variation in Modern Standard Arabic in Radio News Broadcasts, A Synchronic Descriptive Investigation in the Use of Complementary Particles, Orientalia Lovaniensia Analecta, 117. Leuven: Peeters.Google Scholar

Van Mol, M. (2005). From lexical database to tagged Arabic corpus. Paper Presented at the ACIDA/ICMI Conference, Tozeur, 5–6 November. https://ilt.kuleuven.be/arabic/pdf/Mark%20Van%20Mol%20A031.pdf; last accessed 11 December 2020.Google Scholar

Van Mol, M. (2010). Arabic oral media and corpus linguistics: A first methodological outline. In Bassiouni, R., ed., Arabic and the Media: Linguistic Analyses and Applications. Leiden: Brill, 63–79.Google Scholar

Van Mol, M. (2012). From paper dictionary to an elaborate electronic lexicographical database. In Vatvedt, R. and Torjusen, J. M., eds., Proceedings of the 15th EURALEX International Congress,7–11 August (2012). Oslo: Department of Linguistics and Scandinavian Studies, University of Oslo, 758–63.Google Scholar

Van Mol, M. (2014). تطوير متكامل إلكتروني لتدريس اللغة العربية لللناطقين بغيرها [The development of an all compassing electronic device for L2 Arabic learners] In Al-Qahtani, A. et al., eds., أعمال مؤتمر :اتجاهات حديثة في تعليم لغة ثانية [Proceedings of the Current Tendencies in the Teaching of Arabic as L2 Language Conference]. Ryadh: Dār Jāmi^cat al-Malik Sa^cūd lil-Nashr, 219–55.Google Scholar

Van Mol, M. (2017a). La langue arabe et la definition de ses différents niveaux de langue. Éxigences, possibilités et limitations d’une analyse numérique sur base de corpus représentatifs. In Mehdat-Lecocq, H., ed., Arabe standard et variations regionals, Quelle(s) politique(s) linguistique(s)? Quelle(s) didactique(s)? Paris: Éditions des archives contemporaires, 3–46.Google Scholar

Van Mol, M. (2017b). Arabic language teaching and the real linguistic situation: What does linguistic empirical research teach us about Arabic language levels. In Shigeki, K., ed., Proceedings of the 8th Congress of Arabic Linguistics (2015). Kyoto: Tokyo University of Foreign Studies, 331–51.Google Scholar

Van Mol, M. and Berghman, K. (2001a). Leerwoordenboek Modern Arabisch– Nederlands, (Learners Dictionary Modern Arabic–Dutch). Amsterdam: The Dutch Language Union, Bulaaq.Google Scholar

Van Mol, M. and Berghman, K. (2001b). Leerwoordenboek Nederlands – Modern Arabisch (Learners Dictionary Dutch–Modern Arabic). Amsterdam: The Dutch Language Union, Bulaaq.Google Scholar

Wehr, H. (1994). Arabic–English Dictionary, 4th ed. Urbana, IL: Spoken Language Services.Google Scholar

Whitcomb, L. and Alansary, S. (2018). Using linguistic corpora in Arabic Foreign Language Teaching. In Wahba, K., England, L., and Taha, Z. A., eds., Handbook for Arabic Language Teaching Professionals in the 21st Century, vol. II. New York: Routledge, 219–31.Google Scholar

Yaghan, M. A. (2008). Arabizi: A contemporary style of Arabic slang. Design Issues, 24, 39–52.Google Scholar

Yassen, K., Sawalha, M., and Al Zaghoul, F. (2017). Part-of-speech tagging for Classical and MSA text using NLTK. In Proceedings of the New Trends in Information Technology. Amman: University of Jordan, 106–12.Google Scholar

Yaʾqub, I. (1988). Mawsuʿat al-ḥurūf [Thesaurus]. Beirut: Dar al Jayl.Google Scholar

Zaghouani, W. (2014). Critical survey of the freely available Arabic corpora. In Calzolari, N., Choukri, K., and Declerck, T. et al., eds., Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik: European Languages Resources Association, 1–8.Google Scholar

Zahran, M. A., Magooda, A., Mahgoub, A. Y., Raafat, H., Rashwan, M., and Atyia, A. (2015). Word representations in vector space and their applications for Arabic. In Gelbukh, A., ed., International Conference on Intelligent Text Processing and Computational Linguistics. Dordrecht: Springer, 430–43.Google Scholar

Zeroual, I., Lakhoaga, A., and Belhabib, R. (2017). Towards a standard part of speech tagset for the Arabic language. Journal of King Saud University – Computer and Information Sciences, 29(2), 171–8.Google Scholar

Corpora and Web Resources

Arabic WordNet http://globalwordnet.org/arabic-wordnet/.Google Scholar

Broad Operational Language Translation (BOLT) program: https://catalog.ldc.upenn.edu/LDC2017T07.Google Scholar

Infoguistics, Ayman Eddakrouri: https://sites.google.com/a/aucegypt.edu/infoguistics/directory/Corpus-Linguistics/arabic-corpora.Google Scholar

LDC (Linguistic Data Consortium) www.ldc.upenn.edu/.Google Scholar

Penn Arabic Treebank Guidelines: www.researchgate.net/publication/228395939_Penn_Arabic_Treebank_guidelines.Google Scholar

Prague Arabic Dependency Treebank (PADT) 1.0: https://catalog.ldc.upenn.edu/LDC2004T23.Google Scholar

Quamus Arabic Lexicography: Buckwalter T (2002). www.qamus.org/.Google Scholar

Quranic Arabic Corpus: https://corpus.quran.com/documentation/.Google Scholar

References

Abdulrahim, D. (2013). A Corpus Study of Basic Motion Events in Modern Standard Arabic. Unpublished doctoral dissertation, University of Alberta, Canada.Google Scholar

Abdulrahim, D. (2019a). Go constructions in Modern Standard Arabic: A corpus-based study. Constructions and Frames, 11(1), 1–42.Google Scholar

Abdulrahim, D. (2019b). Quantitative approaches to analysing COME constructions in Modern Standard Arabic. In McEnery, T., Hardie, A., and Younis, N., eds., Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press, 170–200.Google Scholar

Abdul-Fattah, A. (2010). A Corpus-Based Study of Conjunction in Arabic Translated and Non-Translated Texts Written by the Same Translators/Authors. Doctoral dissertation, University of Manchester, UK.Google Scholar

Abdul-Fattah, A. (2018). Explicating structural shifts in English–Arabic translation: A corpus-based study of the causal conjunctives because and li’anna. Arab World English Journal for Translation and Literary Studies, 2(1), 39–59.Google Scholar

Abu Kwaik, K., Chatzikyriakidis, S., and Dobnik, S. (2019). Can Modern Standard Arabic approaches be used for Arabic dialects? Sentiment analysis as a case study. In El-Haj, M., Rayson, P., Atwell, E., and Alsudias, L., eds., Proceedings of the 3rd Workshop on Arabic Corpus Linguistics (WACL-3). Association for Computational Linguistics, 40–50. www.aclweb.org/anthology/W19-5606/; last accessed 14 December 2020.Google Scholar

Abu Kwaik, K., Saad, M., Chatzikyriakidis, S., and Dobnik, S. (2018a). Shami: A corpus of Levantine Arabic dialects. In Calzolari, N., Choukri, K., Cieri, C., Declerck, T. et al., eds., Proceedings of the Eleventh International Conference on Language Resources and Evaluation: European Languages Resources Association (ELRA), 3645–52. www.aclweb.org/anthology/L18-1576; last accessed 14 December 2020.Google Scholar

Abu Kwaik, K., Saad, M., Chatzikyriakidis, S., and Dobnik, S. (2018b). A lexical distance study of Arabic dialects. Procedia Computer Science, 142, 2–13.Google Scholar

Al-Raisi, F., Lin, W., and Bourai, A. (2018). A monolingual parallel corpus of Arabic. Procedia Computer Science, 142, 334–8.Google Scholar

Al-Sulaiti, L. and Atwell, E. (2006). The design of a corpus of contemporary Arabic. International Journal of Corpus Linguistics, 11(1), 1–36.Google Scholar

Alasmri, I. and Kruger, H. (2018). Conjunctive markers in translation from English to Arabic: A corpus-based study. Perspectives: Studies in Translation Theory and Practice, 26(5), 767–88.Google Scholar

Albadarneh, J., Talafha, B., Al-Ayyoub, M., Zaqaibeh, B., Al-Smadi, M., Jararweh, Y., et al. (2015). Using big data analytics for authorship authentication of Arabic tweets. In Anjum, A. and Papadopolous, G., eds., Proceedings of the 8th International Conference on Utility and Cloud Computing, IEEE Press, 448–52.Google Scholar

Alfaifi, A. and Atwell, E. (2013). Potential uses of the Arabic Learner Corpus. In Proceedings of the Leeds Language, Linguistics and Translation PGR Conference (2013). Leeds, UK.Google Scholar

Alfaifi, A., Atwell, E., and Hedaya, I. (2014). Arabic Learner Corpus (ALC) v2: A new written and spoken corpus of Arabic learners. In Ishikawa, S, ed., Proceedings of the Learner Corpus Studies in Asia and the World (LCSAW), 77–89.Google Scholar

Alotaibi, H. M. (2017). Arabic–English Parallel Corpus: A new resource for translation training and language teaching. Arab World English Journal, 8(3), 319–37.Google Scholar

Alshutayri, A. and Atwell, A. (2019). Classifying Arabic dialect text in the Social Media Arabic Dialect Corpus (SMADC). In El-Haj, M., Rayson, P., Atwell, E., and Alsudias, L., eds., Proceedings of the 3rd Workshop on Arabic Corpus Linguistics (WACL-3). Association for Computational Linguistics, 51–9.Google Scholar

Arts, T., Belinkova, Y., Habash, N., Kilgarriff, A., and Suchomel, V. (2014). arTenTen: Arabic corpus and word sketches. Journal of King Saud University – Computer and Information Sciences, 26, 357–71. www.sciencedirect.com/science/article/pii/S1319157814000330; last accessed 14 December 2020.Google Scholar

Atwell, E. (2019). Using the web to model Modern and Qur’anic Arabic. In McEnery, T., Hardie, A., and Younis, N., eds., Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press, 100–19.Google Scholar

Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. Amsterdam: John Benjamins.Google Scholar

Bazzi, S. (2014). Foreign metaphors and Arabic translation: An empirical study in journalistic translation practice. Journal of Language and Politics, 13(1), 120–51.Google Scholar

Beeby, A., Rodríguez Inés, P., and Sánchez-Gijón, P. (2009). Corpus Use and Translating: Corpus Use for Learning to Translate and Learning Corpus Use to Translate. Amsterdam: John Benjamins.Google Scholar

Belinkov, Y., Madigow, A., Barrón-Cedeño, A., Schmidman, A., and Romanov, M. (2019). Studying the history of the Arabic language: Language technology and a large-scale historical corpus. Language Resources and Evaluation, 53, 771–805. https://doi.org/10.1007/s10579-019-09460-w; last accessed 14 December 2020.Google Scholar

Ben Salhi, H. (2010). Small parallel corpora in an English–Arabic translation classroom: No need to reinvent the wheel in the era of globalization. In Shiyab, S. M., Rose, M. G., House, J., and Dural, J., eds., Globalization and Aspects of Translation. Newcastle upon Tyne: Cambridge Scholars Publishing, 53–67.Google Scholar

Bernardini, S., Stewart, D., and Zanettin, F. (2007). Corpora in translator education: An introduction. In Zanettin, F., Bernardini, S., and Stewart, D., eds., Corpora in Translator Education. Beijing: Foreign Language Teaching and Research Press, 1–14.Google Scholar

Bouamor, H., Habash, N., and Oflazer, K. (2014). A multidialectal parallel corpus of Arabic. In Calzolari, N., Choukri, K., Declerck, T., Loftsson, H. et al., eds., Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014). European Languages Resources Association (ELRA), 1240–5.Google Scholar

Boudad, N., Faizi, R., Haj Thami, R., and Chiheb, R. (2018). Sentiment analysis in Arabic: A review of the literature. Ain Shams Engineering Journal, 9(4), 2479–90.Google Scholar

Braun, S. (2007). Integrating corpus work into secondary education: From data-driven learning to needs-driven corpora. ReCALL, 19(3), 307–28.Google Scholar

Brustad, K. (2000). The Syntax of Spoken Arabic: A Comparative Study of Moroccan, Egyptian, Syrian, and Kuwaiti Dialects. Washington, DC: Georgetown University Press.Google Scholar

Buckwalter, T. and Parkinson, D. (2011). A Frequency Dictionary of Arabic: Core Vocabulary for Learners (Routledge frequency dictionaries). London: Routledge.Google Scholar

Burton, G. (2012). Corpora and coursebooks: Destined to be strangers forever? Corpora, 7(1), 91–108.Google Scholar

Camilleri, J. J. (2016). Digitizing the grammar and vocabulary of Maltese. In Puech, G. and Saade, B., eds., Shifts and Patterns in Maltese. Berlin: De Gruyter Mouton, 359–85.Google Scholar

Campoy, M., Bellés-Fortuño, B., and Gea-Valor, M. (2010). Corpus-Based Approaches to English Language Teaching. London: Continuum.Google Scholar

Chambaz, A. and Desagulier, G. (2016). Predicting is not explaining: Targeted learning of the dative alternation. Journal of Causal Inference, 4(1), 1–30.Google Scholar

Chambers, A. (2007). Popularising corpus consultation by language learners and teachers. In Hidalgo, E., Quereda, L., and Santana, J., eds., Corpora in the Foreign Language Classroom: Selected Papers from the Sixth International Conference on Teaching and Language Corpora (TALC 6). Amsterdam: Rodopi, 3–14.Google Scholar

Clark, A. and Lappin, S. (2011). Linguistic Nativism and the Poverty of the Stimulus. Chichester, UK: Wiley-Blackwell.Google Scholar

Conrad, S. (2005). Corpus linguistics and L2 teaching. In Hinkel, E., ed., Handbook of Research in Second Language Teaching and Learning. London: Routledge, 393–411.Google Scholar

Conrad, S., and Biber, D. (2009). Real Grammar: A Corpus-Based Approach to English. London: Pearson.Google Scholar

Corpas Pastor, G. and Seghiri, M. (2010). Size matters: A quantitative approach to corpus representativeness. In Rabadán, R., ed., Lengua, traducción, recepción: en honor de Julio César Santoyo / Language, translation, reception. To honor Julio César Santoyo. León: Universidad de León, 111–46.Google Scholar

Cowie, F. (1999). What’s Within? Nativism Reconsidered. Oxford: Oxford University Press.Google Scholar

Desagulier, G. (2017). Noam Chomsky’s colorless green idea: ‘Corpus linguistics doesn’t mean anything’. Around the Word, 5 December 2017. https://corpling.hypotheses.org/252; last accessed 14 December 2020.Google Scholar

Duwairi, R., Ahmed, N. A., and Al-Rifai, S. (2015). Detecting sentiment embedded in Arabic social media: A lexicon-based approach. Journal of Intelligent Fuzzy Systems, 29(1), 107–17.Google Scholar

El-Fiqi, H., Petraki, E., and Abbass, H. A. (2019). Network motifs for translator stylometry identification. PLOS ONE 14(2), 2039–45. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0211809; last accessed 14 December 2020.Google Scholar

Farghaly, A. and Shaalan, K. (2009). Arabic natural language processing: Challenges and solutions. ACM Transactions on Asian Language Information Processing, 8, 1–20.Google Scholar

Frankenberg-Garcia, A., Flowerdew, L., and Aston, G. (2010). New Trends in Corpora and Language Learning (Research in corpus and discourse). New York: Continuum International Publications.Google Scholar

Gadalla, H. (2006). Arabic imperfect verbs in translation: A corpus study of English renderings. Meta: Translator’s Journal, 51(1), 51–71.Google Scholar

Gaskell, D. and Cobb, T. (2004). Can learners use concordance feedback for writing errors? System, 32(3), 301–19.Google Scholar

Götz, S. and Mukherjee, J. (2019). Learner Corpora and Language Teaching, Studies in Corpus Linguistics, vol. 92. Amsterdam: John Benjamins.Google Scholar

Granger, S. (2008). Learner corpora. In Lüdeling, A. and Kytö, M., eds., Corpus Linguistics: An International Handbook, vol. 1. Berlin: Walter de Gruyter, 259–75.Google Scholar

Guidère, M. (2002). Toward corpus-based machine translation for Standard Arabic. Translation Journal, 6(1). https://pdfs.semanticscholar.org/268e/f55030a49207071c56538c634965ee568ed8.pdf; last accessed 14 December 2020.Google Scholar

Habash, N. Y. (2010). Introduction to Arabic Natural Language Processing (Synthesis lectures on human language technologies, #10). San Rafael, CA: Morgan & Claypool.Google Scholar

Hidalgo, E., Quereda, R, and Santana, J. (2007). Corpora in the Foreign Language Classroom: Selected Papers from the Sixth International Conference on Teaching and Language Corpora (TALC 6), University of Granada, Spain, 4–7 July 2004 (Language and Computers, no. 61). Amsterdam: Rodopi.Google Scholar

Hoogland, J. (1996). The use of OCR software for Arabic in order to create a text corpus of Modern Standard Arabic for lexicographic purposes. In Ubaydli, A., ed., Proceedings of the International Conference and Exhibition on Multi-Lingual Computing. Cambridge: Cambridge University Press, 2701–16.Google Scholar

Hu, K. (2016). Introducing Corpus-based Translation Studies (New frontiers in translation studies). Heidelberg: Springer.Google Scholar

Inoue, G., Habash, N., Matsumoto, Y. and Aoyama, H. (2018). A parallel corpus of Arabic–Japanese news articles. In Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B. et al., eds., Proceedings of the 5th International Conference on Language Resources and Evaluations, LREC 2018, 918–24. https://aclweb.org/anthology/L18-1147; last accessed 14 December 2020.Google Scholar

Izwaini, S. (2010). Translation and the Language of Information Technology: A Corpus-Based Study of the Vocabulary of Information Technology in English and Its Translation into Arabic and Swedish. Saarbrücken: VDM Verlag Dr. Müller.Google Scholar

Khalifa, S., Habash, N., Abdulrahim, D., and Hassan, S. (2016). A large-scale corpus of Gulf Arabic. In Calzolari, N., Choukri, K., Cieri, C., Declerck, T. et al., eds., Proceedings of the Tenth International Conference on Language Resources and Evaluation, Portorož, Slovenia. Luxembourg: European Languages Resources Association (ELRA), 4282–9. www.aclweb.org/anthology/L16-1679; last accessed 14 December 2020.Google Scholar

Kilgarriff, A., Charalabopoulou, F., Gavrilidou, M., Johannessen, J. B., Khalil, S., Johansson Kokkinakis, S., et al. (2014). Corpus-based vocabulary lists for language learners for nine languages. Language Resources and Evaluation, 48(1), 121–63.Google Scholar

Kruger, A. (2002). Corpus-based translation research: Its development and implications for general, literary and Bible studies. Acta Theologica Supplementum, 2, 70–106.Google Scholar

Laviosa, S. (1998). The corpus-based approach: A new paradigm in translation studies. Meta: Translators’ Journal, 43(4), 474–9.Google Scholar

Lo, M. (2019). The Arabic Classroom: Context, Text and Learners. Abingdon, Oxon: Routledge.Google Scholar

Lulu, L. and Elnagar, A. (2018). Automatic Arabic dialect classification using deep learning models. Procedia Computer Science, 142, 262–9.Google Scholar

MacWhinney, B. (2004). A multiple process solution to the problem of language acquisition. Journal of Child Language, 31(4), 883–914.Google Scholar

Malmkjaer, K. (2003). On a pseudo-subversive use of corpora in translator training. In Zanettin, F., Bernardini, S., and Stewart, D., eds., Corpora in Translator Education, Manchester, UK: St. Jerome, 119–34.Google Scholar

Mansour, M. A. (2013). The absence of Arabic corpus linguistics: A call for creating an Arabic National Corpus. International Journal of Humanities and Social Science, 3(12), 81–90.Google Scholar

McEnery, T., Hardie, A. and Younis, N. (2019a). Introducing Arabic Corpus Linguistics. In McEnery, T., Hardie, A., and Younis, N., eds., Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press, 1–16.Google Scholar

McEnery, T., Hardie, A., and Younis, N. (2019b). Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press.Google Scholar

McNeil, K. (2019). Tunisian Arabic Corpus: Creating a written corpus of an ‘unwritten’ language. In McEnery, T., Hardie, A., and Younis, N., eds., Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press, 30–55.Google Scholar

Oakes, M., and Ji, M. (2012). Quantitative Methods in Corpus-Based Translation Studies: A Practical Guide to Descriptive Translation Research, Studies in Corpus Linguistics, vol. 51. Amsterdam: John Benjamins.Google Scholar

O’Keeffe, A., McCarthy, M., and Carter, R. (2007). From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press.Google Scholar

Olohan, M. (2004). Introducing Corpora in Translation Studies. London: Routledge.Google Scholar

Omer, A. and Oakes, M. (2019). The writing styles of Salwa and Al-Qarni. In El-Haj, M., Rayson, P., Atwell, E., and Alsudias, L., eds., Proceedings of the 3rd Workshop on Arabic Corpus Linguistics (WACL-3). Association for Computational Linguistics, 16–21.Google Scholar

Owens, J. and Hassan, J. (nd). In Their Own Words: A Sociolinguistically Informed Corpus of Nigerian Arabic. Ms. Universität Bayreuth. www.neu.uni-bayreuth.de/de/Uni_Bayreuth/Fakultaeten/4_Sprach_und_Literaturwissenschaft/islamwissenschaft/arabistik/en/Idiomaticity__lexical_realignment__and_semantic_change_in_spoken_arabic/Nigerian_Arabic/index.html; last accessed 14 December 2020.Google Scholar

Parkinson, D. (2003). Future variability: A corpus study of Arabic future particles. In Parkinson, D. and Farwaneh, S., eds., Perspectives on Arabic Linguistics XV: Papers from the Fifteenth Annual Symposium on Arabic Linguistics, Salt Lake City (2001). Amsterdam: John Benjamins, 191–211.Google Scholar

Parkinson, D. (2006). Using Arabic Synonyms. Cambridge: Cambridge University Press.Google Scholar

Parkinson, D. (2008). Sentence subject agreement variation in Arabic. In Ibrahim, Z. and Makhlouf, S., eds, Linguistics in an Age of Globalization: Perspectives on Arabic Language and Teaching. Cairo: The American University in Cairo Press, 67–90.Google Scholar

Parkinson, D. (2019). Under the hood of arabiCorpus. In McEnery, T., Hardie, A., and Younis, N., eds., Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press, 17–29.Google Scholar

Parkinson, D. and Ibrahim, Z. (1999). Testing lexical differences in regional standard Arabics. In Benmamoun, E., ed., Perspectives on Arabic Linguistics XII. Amsterdam: John Benjamins, 183–202.Google Scholar

Procházka, S. and Dallaj, I. (2020). Polar questions in Tunis Arabic. In G. Chikovani and Z. Tskhvediani, eds., Studies on Arabic Dialectology and Sociolinguistics: Proceedings of the 13th International Conference of AIDA held in Kutaisi, June 10–13, 2019. Kutaisi: Akaki Tserteli State University Press, 233–40.Google Scholar

Pullum, G. K. and Scholz, B. C. (2002). Empirical Assessments of the poverty of stimulus arguments. The Linguistic Review, 19, 9–50.Google Scholar

Romanov, M. (2019). Toward the digital history of the pre-modern Muslim world: Developing text-mining techniques for the study of Arabic biographical collections. In Andrews, T. L. and Macé, C., eds., Analysis of Ancient and Medieval Texts and Manuscripts: Digital Approaches. Brepols Online, 229–44.Google Scholar

Saffran, J. (2003). Statistical language learning: Mechanisms and constraints. Current Directions in Psychological Science, 12(4), 110–14.Google Scholar

Salameh, M., Bouamor, H., and Habash, N. (2018). Fine-grained Arabic dialect identification. In Proceedings of the International Conference on Computational Linguistics (COLING), 1332–44.Google Scholar

Sampson, G. (1989). Language acquisition: Growth or learning? Philosophical Papers, 18, 203–40.Google Scholar

Samy, D., Sandoval, A. M., Guirao, J. M., and Alfonseca, E. (2006). Building a parallel multilingual corpus (Arabic–Spanish–English). In Calzolari, N., Choukri, K., Gangemi, A., Maegaard, B. et al., eds., Proceedings of the 5th International Conference on Language Resources and Evaluations (LREC 2006). European Language Resources Association (ELRA), 2176–81.Google Scholar

Shiyab, S., Rose, S., House, J., and Duval, J. (2010). Globalization and Aspects of Translation. Newcastle upon Tyne: Cambridge Scholars Publishing.Google Scholar

Simpson, R. and Swales, J. (2001). Corpus Linguistics in North America: Selections from the 1999 Symposium. Ann Arbor, MI: University of Michigan Press.Google Scholar

Stubbs, M. (1993). British traditions in text analysis: From Firth to Sinclair. In Baker, M., Francis, G., and Tognini-Bonelli, E., eds., Text and Technology. Philadelphia: John Benjamins, 1–33.Google Scholar

Thackston, W. M. Jr. (1996). The Vernacular Arabic of the Lebanon. Cambridge, MA: Department of Near Eastern Languages and Civilizations, Harvard University.Google Scholar

Tymoczko, M. (1998). Computerized corpora and the future of translation studies. Meta, 43(4), 452–9.Google Scholar

Van Mol, M. (2000). The development of a new learner’s dictionary for Modern Standard Arabic: The linguistic corpus approach. In Heid, U., Evert, S., Lehmann, E., and Rohrer, C., eds., Proceedings of the Ninth EURALEX International Congress. Stuttgart, 831–6.Google Scholar

Whitcomb, L. and Alansary, S. (2018). Using linguistic corpora in Arabic foreign language teaching and learning. In Wahba, K., England, L., and Taha, Z., Handbook for Arabic Language Teaching Professionals in the 21st Century, vol. II. New York: Routledge, 219–31.Google Scholar

White, M. G. and Lonsdale, D. W. (2019). Verbs in Egyptian Arabic: A case for register variation. In El-Haj, M., Rayson, P., Atwell, E., and Alsudias, L., eds., Proceedings of the 3rd Workshop on Arabic Corpus Linguistics (WACL-3). Association for Computational Linguistics, 60–71. www.aclweb.org/anthology/W19-5608; last accessed 14 December 2020.Google Scholar

Wichmann, A. (1997). Teaching and Language Corpora (Applied Linguistics and Language Study). London: Longman.Google Scholar

Widdowson, H. (1991). The description and prescription of language. In Alatis, J., ed., Georgetown University Roundtable on Language and Linguistics, 1991–Linguistics and language pedagogy: The State of the Art. Washington, DC: Georgetown University Press, 11–24.Google Scholar

Wilmsen, D. (2010). Dialects of written Arabic: Syntactic differences in the treatment of object pronouns in the Arabic of Egyptian and Levantine newspapers. Arabica, 57(1), 99–128.Google Scholar

Wilmsen, D. (2013). The Demonstrative iyyā-: A little-considered aspect of Arabic deixis. Arabica, 60, 332–58.Google Scholar

Wilmsen, D. (2015). Perfect modality: Auxiliary verbs and finite subordinates in Levantine (and other) Arabics. Al-ʿArabiyya, 48, 157–74.Google Scholar

Zaghouani, W. (2014). Critical survey of the freely available Arabic corpora. In Proceedings of International Conference on Language Resources and Evaluation (LREC 2014), Reykjavic, Iceland. https://arxiv.org/pdf/1702.07835.pdf; last accessed 14 December 2020.Google Scholar

Zaki, M. (2017). Corpus-based teaching in the Arabic classroom: Theoretical and practical perspectives. International Journal of Applied Linguistics, 27(2), 514–41.Google Scholar

Zantout, R. and Guessoum, A. (2015). Obstacles facing Arabic machine translation: Building a neural network-based transfer module. In Izwaini, S., ed,. Papers in Translation Studies. Newcastle upon Tyne: Cambridge Scholars Publishing.Google Scholar

Zemánek, P. (2001). Clara (Corpus Linguae Arabicae): An overview. In ELSNET, ed., Proceedings of ACL/EACL Workshop on Arabic Language Processing. Toulouse, France. www.elsnet.org/acl2001-arabic.html; last accessed 14 December 2020.Google Scholar

Zemánek, P. and Milička, J. (2014). Quotations, relevance and time depth: Medieval Arabic literature in grids and networks. In Feldman, A., Kazantseva, A., and Szpakowicz, S., eds., Proceeding of the 3rd Workshop on Computational Linguistics for Literature (CLFL). Association for Computational Linguistics, 17–24. www.aclweb.org/anthology/W14-0903; last accessed 14 December 2020.Google Scholar

Zeroual, I. and Lakhouaja, A. (2018). Arabic corpus linguistics: Major progress, but still a long way to go. Studies in Computational Intelligence, 740, 613–36.Google Scholar

Zitouni, I., Olive, J. P., Iskra, D., Choukri, K., Emam, O., Gedge, O., et al. (2002). ORIENTEL: Speech-based interactive communication applications for the Mediterranean and the Middle East. In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002). https://pdfs.semanticscholar.org/fec6/336ef6a292e18fb83a46841fea9c80b77955.pdf; last accessed 14 December 2020.Google Scholar

Arabic Corpora

Arabic Learner Corpus: http://www.arabiclearnercorpus.com

arabiCorpus: http://arabicorpus.byu.edu

CALM: http://linguistics.byu.edu/thesisdata/CALMcorpusDownload.html

Gumar: https://camel.abudhabi.nyu.edu/gumar/

Korpus Malti http://mlrs.research.um.edu.mt/index.php?page=corpora

shami-corpus: https://github.com/GU-CLASP/shami-corpus

Tunico: https://tunico.acdh.oeaw.ac.at/about_corpus.html

Tunisya: http://www.tunisiya.org

Book contents

Part IV - Arabic Computational and Corpus Linguistics

Summary

Access options

References

References

References

Corpora and Web Resources

References

Arabic Corpora

Save book to Kindle

Save book to Dropbox

Save book to Google Drive