Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

Olga Uryupina; Ron Artstein; Antonella Bristot; Federica Cavicchio; Francesca Delogu; Kepa J. Rodriguez; Massimo Poesio

doi:10.1017/S1351324919000056

Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

Published online by Cambridge University Press: 07 May 2019

Kepa J. Rodriguez and

Massimo Poesio

Show author details

Olga Uryupina*: Affiliation:
Department of Information Engineering and Computer Science, University of Trento
Ron Artstein: Affiliation:
Institute for Creative Technologies, University of Southern California
Federica Cavicchio: Affiliation:
Sign Language Lab, University of Haifa
Francesca Delogu: Affiliation:
Department of Computational Linguistics & Phonetics, Saarland University
Kepa J. Rodriguez: Affiliation:
Archives Division, Yad Vashem
Massimo Poesio: Affiliation:
School of Electronic Engineering and Computer Science, Queen Mary University of London
*: Corresponding author. Email: uryupina@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper presents the second release of arrau, a multigenre corpus of anaphoric information created over 10 years to provide data for the next generation of coreference/anaphora resolution systems combining different types of linguistic and world knowledge with advanced discourse modeling supporting rich linguistic annotations. The distinguishing features of arrau include the following: treating all NPs as markables, including non-referring NPs, and annotating their (non-) referentiality status; distinguishing between several categories of non-referentiality and annotating non-anaphoric mentions; thorough annotation of markable boundaries (minimal/maximal spans, discontinuous markables); annotating a variety of mention attributes, ranging from morphosyntactic parameters to semantic category; annotating the genericity status of mentions; annotating a wide range of anaphoric relations, including bridging relations and discourse deixis; and, finally, annotating anaphoric ambiguity. The current version of the dataset contains 350K tokens and is publicly available from LDC. In this paper, we discuss in detail all the distinguishing features of the corpus, so far only partially presented in a number of conference and workshop papers, and we also discuss the development between the first release of arrau in 2008 and this second one.

Keywords

coreference anaphora discourse annotation linguistic corpora

Type: Article
Information: Natural Language Engineering , Volume 26 , Issue 1 , January 2020 , pp. 95 - 128

DOI: https://doi.org/10.1017/S1351324919000056 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Amoia, M., Kunz, K. and Lapshinova-Koltunski, E. (2011). Discontinuous constituents: A problematic case for parallel corpora annotation and querying. In Proceedings of RANLP2011 Workshop on Annotation and Exploitation of Parallel Corpora, pp. 2–10.Google Scholar

Artstein, R. and Poesio, M. (2006). Identifying reference to abstract objects in dialogue. In Schlangen, D. and Fernandez, R. (eds.), Proceedings of the Workshop on the Semantics and Pragmatics of Dialogue.Google Scholar

Artstein, R. and Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics 34(4), 555–596.Google Scholar

Asher, N. (1993). Reference to Abstract Objects in English. Dordrecht: D. Reidel.Google Scholar

Björkelund, A. and Farkas, R. (2012). Data-driven multilingual coreference Resolution using resolver stacking. In Proceedings of the Conference on Computational Natural Language Learning: Shared Task, 49–55.Google Scholar

Björkelund, A. and Kuhn, J. (2014). Learning structured perceptions for coreference resolution with latent antecedents and non-local features. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 47–57.Google Scholar

Botley, S.P. (2006). Indirect anaphora: Testing the limits of corpus-based linguistics. International Journal of Corpus Linguistics 11(1), 73–112.CrossRef Google Scholar

Boyd, A., Dickinson, M. and Meurers, D. (2008). On detecting errors in dependency treebanks. Research on Language and Computation 6(2), 113–137.CrossRef Google Scholar

Burga, A., Cajal, S., Codina-Filba, J. and Wanner, L. (2016). Towards multiple antecedent coreference resolution in specialized discourse. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Byron, D. and Allen, J. (1998). Resolving demonstrative anaphora in the TRAINS-93 corpus. In Proceedings of the Second Colloquium on Discourse, Anaphora and Reference Resolution, University of Lancaster.Google Scholar

Byron, D. (2002). Resolving pronominal references to abstract entities. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 80–87.Google Scholar

Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics 22(2), 249–254.Google Scholar

Carlson, G.N. and Pelletier, F.J. (eds.) (1995). The Generic Book. Chicago, IL: University of Chicago Press.Google Scholar

Carlson, L., Marcu, D. and Okurowski, M.E. (2002). RST Discourse Treebank LDC2002T07.Google Scholar

Carlson, L., Marcu, D. and Okurowski, M.E. (2003). Building a discourse-tagged corpus in the framework of rhetorical structure theory. In Kuppevelt, J. and Smith, R. (eds.), Current Directions in Discourse and Dialogue. Dordrecht: Kluwer, pp. 85–112.CrossRef Google Scholar

Chafe, W.L. (1980). The Pear Stories: Cognitive, Cultural and Linguistic Aspects of Narrative Production. Norwood, NJ: Ablex.Google Scholar

Chamberlain, J., Poesio, M. and Kruschwitz, U. (2016). Phrase detectives corpus 1.0: Crowdsourced anaphoric coreference. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Chiarcos, C. and Krasavina, O. (2005). Annotation Guidelines PoCoS—Potsdam Coreference Scheme: Core Scheme, Draft 0.912. Potsdam and Berlin: University of Potsdam and Humboldt University.Google Scholar

Clark, H.H. (1975). Bridging. In Proceedings of the Conference on Theoretical Issues in Natural Language Processing.CrossRef Google Scholar

Clark, K. and Manning, C.D. (2016). Improving coreference resolution by learning entity-level distributed representations. In Proceedings of ACL, Berlin.Google Scholar

Davies, S., Poesio, M., Bruneseaux, F. and Romary, L. (1998). Annotating Coreference in Dialogues: A Proposal for a Scheme for MATE. Available at http://www.cogsci.ed.ac.uk/poesio/MATE/anno_manual.html.Google Scholar

Dickinson, M. and Lee, C.M. (2008). Detecting errors in semantic annotation. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Dickinson, M. and Meurers, D. (2003). Detecting errors in part-of-speech annotation. In Proceedings of the 10th Conference of European chapter of the Association for Computational Linguistics, pp. 107–114.Google Scholar

Dipper, S. and Zinsmeister, H. (2012). Annotating abstract anaphora. Language Resources and Evaluation 46(1), 37–52.CrossRef Google Scholar

Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassell, S. and Weischedel, R. (2004). The automatic content extraction (ACE) program–tasks, data, and evaluation. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Durrett, G. and Klein, D. (2013). Easy victories and uphill battles in coreference resolution. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1971–1982.Google Scholar

Eckert, M. and Strube, M. (2000). Dialogue acts, synchronising units and anaphora resolution. Journal of Semantics 17(1), 51–89.CrossRef Google Scholar

Fernandes, E.R., dos Santos, C.N. and Milidiú, R.L. (2014). Latent trees for coreference resolution. Computational Linguistics 40(4), 801–835.CrossRef Google Scholar

Frank, A., Bögel, T., Hellwig, O. and Reiter, N. (2012). Semantic annotation for the digital humanities using Markov logic networks for annotation consistency control. Linguistic Issues in Language Technology 1–21.Google Scholar

Friedrich, A., Palmer, A., Sorensen, M.P. and Pinkal, M. (2015). Annotating genericity: A survey, a scheme, and a corpus. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.CrossRef Google Scholar

Gardent, C. and Manuélian, H. (2005). Creation d’un corpus annotée pour le traitment des descriptions définies. Traitement Automatique des Langues 46(1), 115–139.Google Scholar

Ge, N., Hale, J. and Charniak, E. (1998). A statistical approach to anaphora resolution. In Proceedings of the 6th Workshop on Very Large Corpora, pp. 161–170.Google Scholar

Grishina, Y. and Stede, M. (2015). Knowledge-lean projection of coreference chains across languages. In Proceedings of the 8th Workshop on Building and Using Comparable Corpora, pp. 14–22.CrossRef Google Scholar

Guillou, L., Hardmeier, C., Smith, A., Tiedemann, J. and Webber, B. (2014). ParCor 1.0: A parallel pronoun-coreference corpus to support statistical MT. In Proceedings of the Language Resources and Evaluation Conference, pp. 3191–3198.Google Scholar

Gundel, J.K., Hedberg, N. and Zacharski, R. (2002). Pronouns without explicit antecedents: How do we know when a pronoun is referential?. In Proceedings of the Discourse Anaphora and Anaphor Resolution Colloquium.Google Scholar

Gundel, J.K., Hegarty, M. and Borthen, K. (2003). Cognitive status, information structure, and pronominal reference to clausally introduced entities. Journal of Logic, Language and Information 12(3), 281–299.CrossRef Google Scholar

Hasler, L., Orasan, C. and Naumann, K. (2006). NPs for events: Experiments in coreference annotation. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Hendrickx, I., Bouma, G., Coppens, F., Daelemans, W., Hoste, V., Kloosterman, G., Mineur, A.-M., Van Der Vloet, J. and Verschelde, J.-L. (2008). A coreference corpus and resolution system for Dutch. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Herbelot, A. and Copestake, A. (2008). Annotating genericity: How do humans decide? (a case study in ontology extraction). In Featherston, S. and Winkler, S. (eds.), The Fruits of Empirical Linguistics. Berlin: de Gruyter.Google Scholar

Hinrichs, E., Kübler, S. and Naumann, K. (2005). A unified representation for morphological, syntactic, semantic and referential annotations. In ACL Workshop on Frontiers in Corpus Annotation II: Pie in the Sky.Google Scholar

Hou, Y., Markert, K. and Strube, M. (2013). Global inference for bridging anaphora resolution. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics.Google Scholar

Hollenstein, N., Schneider, N. and Webber, B. (2016). Inconsistency detection in semantic annotation. In Proceedings of the Language Resources and Evaluation Conference, pp. 3986–3990.Google Scholar

Kolhatkar, V. and Hirst, G. (2012). Resolving “this-issue” anaphora. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1255–1265.Google Scholar

Kolhatkar, V. and Hirst, G. (2014). Resolving shell nouns. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.CrossRef Google Scholar

Kolhatkar, V., Zinsmeister, H. and Hirst, G. (2013). Interpreting anaphoric shell nouns using antecedents of cataphoric shell nouns as training data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar

Kolhatkar, V. (2014). Resolving Shell Nouns. PhD Thesis, University of Toronto.CrossRef Google Scholar

Krasavina, O. and Chiarcos, C. (2007). The potsdam coreference scheme. In Proceedings of the 1st Linguistic Annotation Workshop, 156–163.CrossRef Google Scholar

Krippendorf, K. (2004). Content Analysis: An Introduction to Its Methodology, 2nd Edn., chapter 11. Thousand Oaks, CA: Sage.Google Scholar

Kummerfeld, J.K., Bansal, M., Burkett, D. and Klein, D. (2011). Mention detection: Heuristics for the OntoNotes annotations. In Proceedings of the Conference on Computational Natural Language Learning: Shared Task, 102–106.Google Scholar

Kunz, K. and Lapshinova-Koltunski, E. (2015). Cross-linguistic analysis of discourse variation across genres. Nordic Journal of English Studies 14(1), 258–288.CrossRef Google Scholar

Kunz, K., Lapshinova-Koltunski, E. and Martínez, J.M. (2016). Beyond identity coreference: Contrasting indicators of textual coherence in English and German. In Proceedings of the Workshop on Coreference Resolution beyond OntoNotes, pp. 23–31.CrossRef Google Scholar

Lapshinova-Koltunski, E. and Kunz, K.A. (2014). Annotating cohesion for multilingual analysis. In Proceedings of the LREC ISO Workshop on Interoperable semantic resources, pp. 57–64.Google Scholar

Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M. and Jurafsky, D. (2013). Deterministic coreference resolution based on entity-centric, precision-ranked rules. Computational Linguistics 39(4), 885–916.CrossRef Google Scholar

Lee, K., He, L., Lewis, M. and Zettlemoyer, L. (2017). End-to-end neural coreference resolution. In Proceedings of EMNLP.Google Scholar

Loukachevitch, N.V., Dobrov, G.B., Kibrik, A.A., Khudyajova, M.V. and Linnik, A.S. (2011). Factors in referential choice. In Proceedings of Dialogue, Moscow.Google Scholar

Marasović, A., Born, L., Opitz, J. and Frank, A. (2017). A mention-ranking model for abstract anaphora resolution. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.CrossRef Google Scholar

Marcus, M.P., Santorini, B. and Marcinkiewicz, M.A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330.Google Scholar

Martschat, S. and Strube, M. (2015). Latent structures for coreference resolution. Transactions of the Association for Computational Linguistics 3, 405–418.CrossRef Google Scholar

Moosavi, N.S. and Strube, M. (2016). A proposal for a link-based entity aware metric. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.Google Scholar

Markert, K., Hou, Y. and Strube, M. (2012). Collective classification for fine-grained information status. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.Google Scholar

Modieska, N.N. (2003). Resolving Other Anaphors. PhD Thesis, University of Edinburgh.Google Scholar

Müller, C. and Strube, M. (2006). Multi-level annotation of linguistic data with MMAX2. In Braun, S., Kohn, K. and Mukherjee, J. (eds.), Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods, vol. 3 of English Corpus Linguistics. New York: Peter Lang, pp. 197–214.Google Scholar

Müller, M.-C. (2008). Fully Automatic Resolution of it, This and That in Unrestricted Multy-Party Dialog. PhD Thesis, Universität Tübingen.Google Scholar

Nasr, A., Damnati, G., Guerraz, A. and Bechet, F. (2016). Syntactic parsing of chat language in contact center conversation corpus. In Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue, pp. 175–184.CrossRef Google Scholar

Navarretta, C. (2000). Abstract anaphora resolution in Danish. In Proceedings of SIGdial Workshop on Discourse and Dialogue, pp. 56–65.CrossRef Google Scholar

Navarretta, C. (2008). Pronominal types and abstract reference in the Danish and Italian DAD Corpora. In Johansson, C. (ed.), Proceedings of the Second Workshop on Anaphora Resolution NEALT Proceedings Series, Bergen. pp. 63–71.Google Scholar

Nedoluzhko, A. (2013). Generic noun phrases and annotation of coreference and bridging relations in the Prague Dependency Treebank. In Proceedings of the Linguistic Annotation Workshop, pp. 103–111.Google Scholar

Nedoluzhko, A., Mirokvský, J. and Pajas, P. (2009a). The coding scheme for annotating extended nominal coreference and bridging anaphora in the Prague Dependency Treebank. In Proceedings of the Linguistic Annotation Workshop, pp. 108–111.Google Scholar

Nedoluzhko, A., Mírovský, J., Ocelák, R. and Pergler, J. (2009b). Extended coreferential relations and bridging anaphora in the Prague Dependency Treebank. In Proceedings of the 7th Discourse Anaphora and Anaphor Resolution Colloquium, 1–16.Google Scholar

Neumann, S. (2013). Contrastive Register Variation: A Quantitative Approach to the Comparison of English and German. Berlin: Mouton de Gruyter.CrossRef Google Scholar

Nguyen, N.L.T., Kim, J.-D. and Tsujii, J. (2008). Challenges in pronoun resolution system for biomedical text. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Nissim, M., Dingare, S., Carletta, J. and Steedman, M. (2004). An annotation scheme for information status in dialogue. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Palmer, M., Gildea, D. and Kingsbury, (2005). The proposition bank: A corpus annotated with semantic roles. Computational Linguistics 31(1), 71–106.CrossRef Google Scholar

Passonneau, R.J. (1997). Instructions for applying discourse reference annotation for multiple applications (DRAMA). unpublished manuscript.Google Scholar

Peng, H., Chang, K.-W. and Roth, D. (2015). A joint framework for coreference resolution and mention head detection. In Proceedings of the Conference on Computational Natural Language Learning.CrossRef Google Scholar

Poesio, M. (2000a). The GNOME Annotation Scheme Manual, 4th Edn. Scotland: University of Edinburgh, HCRC and Informatics. Available at http://cswww.essex.ac.uk/Research/nle/corpora/GNOME/anno_manual_4.htm.Google Scholar

Poesio, M. (2000b). Annotating a corpus to develop and evaluate discourse entity realization algorithms: Issues and preliminary results. In Proceedings of the Language Resources and Evaluation Conference, pp. 211–218.Google Scholar

Poesio, M. (2004a). The MATE/gnome scheme for anaphoric annotation, revisited. In Proceedings of the SIGdial Workshop on Discourse and Dialogue.Google Scholar

Poesio, M. (2004b). Discourse annotation and semantic annotation in the gnome corpus. In Proceedings of the ACL Workshop on Discourse Annotation, pp. 72–79.Google Scholar

Poesio, M., Stevenson, R., Di Eugenio, B. and Hitzeman, J.M. (2004a). Centering: A parametric theory and its instantiations. Computational Linguistics 30(3), 309–363.CrossRef Google Scholar

Poesio, M., Mehta, R., Maroudas, A. and Hitzeman, J. (2004b). Learning to solve bridging references. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 143–150.CrossRef Google Scholar

Poesio, M., Pradhan, S., Recasens, M., Rodriguez, K. and Versley, Y. (2016). Annotated corpora and annotation tools. In Poesio, M., Stuckardt, R. and Versley, Y. (eds.), Anaphora Resolution: Algorithms, Resources and Applications, chapter 4. Berlin and Heidelberg: Springer.CrossRef Google Scholar

Poesio, M. and Artstein, R. (2005a). The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In Meyers, A. (ed.), Proceedings of ACL Workshop on Frontiers in Corpus Annotation. pp. 76–83.CrossRef Google Scholar

Poesio, M. and Artstein, R. (2005b). Annotating (anaphoric) ambiguity. In Proceedings of the Corpus Linguistics Conference, Birmingham.Google Scholar

Poesio, M., Delmonte, R., Bristot, A., Chiran, L. and Tonelli, S. (2004c). The VENEX Corpus of Anaphoric Information in Spoken and Written Italian. Available at http://cswww.essex.ac.uk/staff/poesio/publications/VENEX04.pdf.Google Scholar

Poesio, M., Patel, A. and Di Eugenio, B. (2006a). Discourse structure and anaphora in tutorial dialogues: An empirical analysis of two theories of the global focus. Research in Language and Computation 4, 229–257 (special Issue on Generation and Dialogue).CrossRef Google Scholar

Poesio, M., Chamberlain, J., Kruschwitz, U., Robaldo, L. and Ducceschi, L. (2013). Phrase detectives: Utilizing collective intelligence for Internet-scale language resource creation. ACM Transactions on Intelligent Interactive Systems 3(1), 1–44.CrossRef Google Scholar

Poesio, M., Grishina, Y., Kolhatkar, V., Moosavi, N., Roesiger, I., Roussel, A., Simonjetz, F., Uma, A., Uryupina, O., Yu, J. and Zinsmeister, H. (2018). Anaphora resolution with the arrau corpus. In Proceedings of the NAACL Workshop on Computational Models of Reference, Anaphora and Coreference.Google Scholar

Poesio, M., Bruneseaux, F. and Romary, L. (1999). The MATE meta-scheme for coreference in dialogues in multiple languages. In Walker, M. (ed.), Proceedings of the ACL Workshop on Standards and Tools for Discourse Tagging, 65–74.Google Scholar

Poesio, M., Sturt, P., Arstein, R. and Filik, R. (2006b). Underspecification and anaphora: Theoretical issues and preliminary evidence. Discourse Processes 42(2), 157–175.CrossRef Google Scholar

Poesio, M. and Reyle, U. (2001). Underspecification in anaphoric reference. In Bunt, E.T.H. and van der Sluis, I. (eds), Proceedings of the Fourth International Workshop on Computational Semantics, Tilburg: Tilburg University, pp. 286–300.Google Scholar

Poesio, M. and Vieira, R. (1998). A corpus-based investigation of definite description use. Computational Linguistics 24(2), 183–216. Also available as Research Paper CCS-RP-71, Centre for Cognitive Science, University of Edinburgh.Google Scholar

Poesio, M. and Artstein, R. (2008). Anaphoric annotation in the arrau corpus. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Poesio, M. and Rieser, H. (2010). Completions, coordination, and alignment in dialogue. Dialogue and Discourse 1(1) 1–89.CrossRef Google Scholar

Pradhan, S., Ramshaw, L., Weischedel, R., MacBride, J. and Micciulla, L. (2007). Unrestricted coreference: Identifying entities and events in OntoNotes. In in Proceedings of the IEEE International Conference on Semantic Computing.CrossRef Google Scholar

Pradhan, S., Ramshaw, L., Marcus, M., Palmer, M., Weischedel, R. and Xue, N. (2011). CoNLL-2011 Shared Task: Modeling unrestricted coreference in OntoNotes. In Proceedings of the Conference on Computational Natural Language Learning.Google Scholar

Pradhan, S., Moschitti, A., Xue, N., Uryupina, O. and Zhang, Y. (2012). CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes. In Proceedings of the Sixteenth Conference on Computational Natural Language Learning.Google Scholar

Pradhan, S., Luo, X., Recasens, M., Hovy, E., Ng, V. and Strube, M. (2014). Scoring coreference partitions of predicted mentions: A reference implementation. In Proceedings of the Annual Meeting of The Association for Computational Linguistics.CrossRef Google Scholar

Prince, E.F. (1992). The ZPG letter: Subjects, definiteness, and information status. In Thompson, S. and Mann, W. (eds.), Discourse Description: Diverse Analyses of a Fund-Raising Text. Amsterdam, The Netherlands: John Benjamins, pp. 295–325.CrossRef Google Scholar

Reiter, N. and Frank, A. (2010). Identifying generic noun phrases. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 40–49.Google Scholar

Roesiger, I. (2018). Rule- and learning-based methods for bridging resolution in the arrau corpus. In Proceedings of the NAACL Worskhop on Computational Models of Reference, Anaphora and Coreference.Google Scholar

Recasens, M. (2008). Discourse deixis and coreference: Evidence from AnCoRa. In Johansson, C. (ed.), Proceedings of the 2nd Workshop on Anaphora Resolution.Google Scholar

Recasens, M., àrquez, L.M., Sapena, E., Mart, M.A.í, Taulé, M., Hoste, V., Poesio, M. and Versley, Y. (2010). SemEval-2010 Task 1: Coreference resolution in multiple languages. In Proceedings of the International Workshop on Semantic Evaluation.Google Scholar

Recasens, M. and Martí, M.A. (2010). AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation 44(4), 315–345.CrossRef Google Scholar

Recasens, M., Hovy, E. and Martí, M.A. (2011). Identity, non-identity, and near-identity: Addressing the complexity of coreference. Lingua 121(6), 1138–1152.CrossRef Google Scholar

Rodriguez, K. (2010). Resources for Linguistically Motivated Multilingual Anaphora Resolution. PhD Thesis, Universitá di Trento.Google Scholar

Rodriguez, K.-J., Delogu, F., Versley, Y., Stemle, E. and Poesio, M. (2010). Anaphoric annotation of Wikipedia and blogs in the live memories Corpus. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Schuster, E. (1988). Pronominal Reference to Events and Actions: Evidence from Naturally-Occurring Data (LINC LAB 100). Philadelphia, PA: Department of Computer and Information Science, University of Pennsylvania.Google Scholar

Stede, M. (2004). The potsdam commentary corpus. In Proceedings of the ACL Workshop on Discourse Annotation.Google Scholar

Uryupina, O. and Poesio, M. (2012). Domain-specific vs. uniform modeling for coreference resolution. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Uryupina, O. and Poesio, M. (2013). Evalita 2011: Anaphora resolution task. In Magnini, B., Cutugno, F., Falcone, M. and Pianta, E. (eds.), Evaluation of Natural Language and Speech Tools for Italian, vol. 7689 of Lecture Notes in Computer Science. Springer, pp. 146–155.CrossRef Google Scholar

Uryupina, O., Artstein, R., Bristot, A., Cavicchio, F., Rodriguez, K.J. and Poesio, M. (2016a). arrau: Linguistically-motivated annotation of anaphoric description. In Proceedings of the Language Resources and Evaluation Conference.Google Scholar

Uryupina, O. and Moschitti, A. (2013). Multilingual mention detection for coreference resolution. In Proceedings of the International Joint Conference on Natural Language Processing.Google Scholar

Uryupina, O., Kabadjov, M. and Poesio, M. (2016b). Detecting non-reference and non-anaphoricity. In Poesio, M., Stuckardt, R. and Versley, Y. (eds.), Anaphora Resolution: Algorithms, Resources and Applications, chapter 13. Berlin and Heidelberg: Springer.Google Scholar

van Deemter, K. and Kibble, R. (2000). On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics 26(4), 629–637.CrossRef Google Scholar

Versley, Y. (2005). Parser evaluation across text types. In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories.Google Scholar

Versley, Y. (2008). Vagueness and referential ambiguity in a large-scale annotated corpus. Research on Language and Computation 6, 333–353.CrossRef Google Scholar

Versley, Y., Ponzetto, S., Poesio, M., Eidelman, V., Jern, A., Smith, J., Yang, X. and Moschitti, A. (2008). BART: A modular toolkit for coreference resolution. In Proceedings of the Annual Meeting of The Association for Computational linguistics, demo session.CrossRef Google Scholar

Vieira, R. (1998). Definite Description Resolution in Unrestricted Texts. PhD Thesis, University of Edinburgh, Centre for Cognitive Science.Google Scholar

Webber, B.L. (1991). Structure and ostension in the interpretation of discourse deixis. Language and Cognitive Processes 6(2), 107–135.CrossRef Google Scholar

Weischedel, R., Hovy, E., Marcus, M., Palmer, M., Belvin, R., Pradhan, S., Ramshaw, L. and Xue, N. (2011). OntoNotes: A large training corpus for enhanced processing. In Olive, J., Christianson, C. and McCary, J. (eds.), Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation. Berlin and Heidelberg: Springer.Google Scholar

Wiseman, S.J., Rush, A.M., Shieber, S.M. and Weston, J. (2015). Learning anaphoricity and antecedent ranking features for coreference resolution. In Proceedings of the Annual Meeting of The Association for Computational linguistics.CrossRef Google Scholar

Zhekova, D. and Kübler, S. (2013). Machine learning for mention head detection in multilingual coreference resolution. In Proceedings of the Recent Advance in Natural Language Processing Conference, 747–754.Google Scholar

Article contents

Annotating a broad range of anaphoric phenomena, in a variety of genres: the ARRAU Corpus

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests