Morphological disambiguation of Hebrew: a case study in classifier combination

GENNADI LEMBERSKY; DANNY SHACHAM; SHULY WINTNER

doi:10.1017/S1351324912000216

Morphological disambiguation of Hebrew: a case study in classifier combination

Published online by Cambridge University Press: 26 July 2012

GENNADI LEMBERSKY ,

DANNY SHACHAM and

SHULY WINTNER

Show author details

GENNADI LEMBERSKY: Affiliation:
Department of Computer Science, University of Haifa, Haifa, Israel e-mails: glembers@campus.haifa.ac.il, danny@shach.am, shuly@cs.haifa.ac.il
DANNY SHACHAM: Affiliation:
Department of Computer Science, University of Haifa, Haifa, Israel e-mails: glembers@campus.haifa.ac.il, danny@shach.am, shuly@cs.haifa.ac.il
SHULY WINTNER: Affiliation:
Department of Computer Science, University of Haifa, Haifa, Israel e-mails: glembers@campus.haifa.ac.il, danny@shach.am, shuly@cs.haifa.ac.il

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Morphological analysis and disambiguation are crucial stages in a variety of natural language processing applications, especially when languages with complex morphology are concerned. We present a system which disambiguates the output of a morphological analyzer for Hebrew. It consists of several simple classifiers and a module that combines them under the constraints imposed by the analyzer. We explore several approaches to classifier combination, as well as a back-off mechanism that relies on a large unannotated corpus. Our best result, around 83 percent accuracy, compares favorably with the state of the art on this task.

Type: Articles
Information: Natural Language Engineering , Volume 20 , Issue 1 , January 2014 , pp. 69 - 97

DOI: https://doi.org/10.1017/S1351324912000216 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adler, M., and Elhadad, M. July 2006. An unsupervised morpheme-based hmm for hebrew morphological disambiguation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 665–72. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P06/P06-1084.Google Scholar

Adler, M. September 2007. Hebrew Morphological Disambiguation: An Unsupervised Stochastic Word-based Approach. PhD thesis, Ben-Gurion University.Google Scholar

Adler, M., Goldberg, Y., Gabay, D., and Elhadad, M. June 2008a. Unsupervised lexicon-based resolution of unknown words for full morphological analysis. In Proceedings of ACL-08: HLT, Columbus, Ohio, pp. 728–736. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/P/P08/P08-1083.Google Scholar

Adler, M., Netzer, Y., Goldberg, Y., Gabay, D., and Elhadad, M. May 2008b. Tagging a Hebrew corpus: the case of participles. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA). ISBN 2-9517408-4-0. http://www.lrec-conf.org/proceedings/lrec2008/.Google Scholar

Bar-Haim, R., Sima'an, K., and Winter, Y. June 2005. Choosing an optimal architecture for segmentation and POS-tagging of Modern Hebrew. In Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, MI, pp. 39–46. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/W/W05/W05-0706.CrossRef Google Scholar

Bar-Haim, R., Sima'an, K., and Winter, Y. 2008. Part-of-speech tagging of Modern Hebrew text. Natural Language Engineering 14 (2): 223–51.CrossRef Google Scholar

Bentur, E., Angel, A., and Segev, D. December 1992. Computerized analysis of Hebrew words. Hebrew Linguistics 36: 33–8 (in Hebrew).Google Scholar

Brill, E. 1995. Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Computational Linguistics 21 (4): 543–66.Google Scholar

Carmel, D., and Maarek, Y. July 1999. Morphological disambiguation for Hebrew search systems. In Proceedings of the 4th International Workshop, NGITS-99, Lecture Notes in Computer Science, no. 1649, pp. 312–25. New York: Springer.Google Scholar

Cutting, D., Kupiec, J., Pedersen, J., and Sibun, P. March 1992. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, Trento, Italy, pp. 133–140. Association for Computational Linguistics. doi: 10.3115/974499.974523. URL http://www.aclweb.org/anthology/A92-1018.CrossRef Google Scholar

Cohen, S. B., and Smith, N. A. June 2007. Joint morphological and syntactic disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 208–17. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/D/D07/D07-1022.Google Scholar

Daya, E., Roth, D., and Wintner, S. July 2004. Learning Hebrew roots: machine learning with linguistic constraints. In Proceedings of EMNLP'04, Barcelona, Spain, pp. 357–64.Google Scholar

Daya, E., Roth, D., and Wintner, S. September 2008. Identifying semitic roots: machine learning with linguistics constraints. Computational Linguistics 34 (3): 429–48.CrossRef Google Scholar

Florian, R. 2002. Named entity recognition as a house of cards: classifier stacking. In Proceedings of CoNLL-2002, Taiwan, pp. 175–8.Google Scholar

Goldberg, Y., and Tsarfaty, R. June 2008. A single generative model for joint morphological segmentation and syntactic parsing. In Proceedings of ACL-08: HLT, Columbus, OH, pp. 371–9. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P08/P08-1043.Google Scholar

Habash, N., and Rambow, O. June 2005. Arabic tokenization, part-of-speech tagging and morphological disambiguation in one fell swoop. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), Ann Arbor, MI, pp. 573–80. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P05/P05-1071.CrossRef Google Scholar

Hajič, J. 2000. Morphological tagging: data vs. dictionaries. In Proceedings of ANLP-NAACL Conference, Seattle, WA, pp. 94–101.Google Scholar

Hajič, J., and Hladká, B. 1998. Tagging inflective languages: prediction of morphological categories for a rich, structured tagset. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Stroudsburg, PA, pp. 483–90. Stroudsburg, PA: Association for Computational Linguistics. http://dx.doi.org/10.3115/980845.980927.Google Scholar

Itai, A., and Wintner, S. March 2008. Language resources for Hebrew. Language Resources and Evaluation 42 (1): 75–98.CrossRef Google Scholar

Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning (ICML-01), Williamstown, MA, pp. 282–9.Google Scholar

Lee, J., Naradowsky, J., and Smith, D. A. June 2011. A discriminative model for joint morphological disambiguation and dependency parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, pp. 885–94. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-1089.Google Scholar

Lembersky, G. March 2003. Named Entity Recognition in Hebrew. Master's thesis, Department of Computer Science, Ben Gurion University, Beer Sheva, Israel (in Hebrew).Google Scholar

Levinger, M., Ornan, U., and Itai, A. September 1995. Learning morpho-lexical probabilities from an untagged corpus with an application to Hebrew. Computational Linguistics 21 (3): 383–404.Google Scholar

Manning, C. D., and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA.: The MIT Press.Google Scholar

Marshall, I. 1983. Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB corpus. Computers and the Humanities 17: 139–50.CrossRef Google Scholar

McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of Eighteenth International Conference on Machine Learning (ICML-00), Stanford, CA.Google Scholar

Powell, M. J. D. January 1964. An efficient method for finding the minimum of a function of several variable without calculating derivatives. The Computer Journal 7 (2): 155–62.CrossRef Google Scholar

Punyakanok, V., and Roth, D. 2001. The use of classifiers in sequential inference. In Proceedings of the 2000 Conference on Advances in Neural Information Processing Systems 13 (NIPS-13), Vancouver, British Columbia, Canada, pp. 995–1001. Cambridge, MA: The MIT Press.Google Scholar

Roth, D. 1998. Learning to resolve natural language ambiguities: a unified approach. In Proceedings of AAAI-98 and IAAI-98, Madison, WI, pp. 806–13.Google Scholar

Roth, D., and Zelenko, D. 1998. Part of speech tagging using a network of linear separators. In The 17th International Conference on Computational Linguistics (COLING-ACL 98), Montreal, Canada, pp. 1136–42.Google Scholar

Segal, E. 1997. Morphological analyzer for unvocalized Hebrew words. Unpublished work. http://www.cs.technion.ac.il/~erelsgl/hmntx.zip Accessed 15 July, 2012.Google Scholar

Segal, E. October 1999. Hebrew Morphological Analyzer for Hebrew Undotted Texts. Master's thesis, Technion, Israel Institute of Technology, Haifa, Israel (in Hebrew).Google Scholar

Shacham, D., and Wintner, S. June 2007. Morphological disambiguation of Hebrew: a case study in classifier combination. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the Conference on Computational Natural Language Learning (EMNLP-CoNLL 2007), Prague, Czech Republic. Stroudsburg, PA: Association for Computational Linguistics.Google Scholar

Sima'an, K., Itai, A., Winter, Y., Altman, A., and Nativ, N. 2001. Building a tree-bank of Modern Hebrew text. Traitement Automatique des Langues 42 (2): 347–380.Google Scholar

Tsarfaty, R. July 2006. Integrated morphological and syntactic disambiguation for modern Hebrew. In Proceedings of the COLING/ACL 2006 Student Research Workshop, Sydney, Australia, pp. 49–54. Stroudsburg, PA: Association for Computational Linguistics. http://www.aclweb.org/anthology/P/P06/P06-3009.Google Scholar

Wintner, S. 2008. Strengths and weaknesses of finite-state technology: a case study in morphological grammar development. Natural Language Engineering 14 (4): 457–69. ISSN .CrossRef Google Scholar

Yona, S., and Wintner, S. April 2008. A finite-state morphological grammar of Hebrew. Natural Language Engineering 14 (2): 173–90.CrossRef Google Scholar

Article contents

Morphological disambiguation of Hebrew: a case study in classifier combination

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests