|Autores||M. Antònia Martí, Mariona Taulé, Marta Recasens, Lluís Màrquez and Manuel Bertran (CLiC-UB)|
|Contacto||Mariona Taulé <mtauleub.edu>|
AnCora-CO-Es is a subset of the multilevel annotated corpus AnCora-Es (for Spanish), consisting of 400,000 words, enriched with coreference information, where all noun phrases (NPs) –pronominal or with a nominal head– pointing to the same entity are linked.
AnCora-CO-Es can be a useful resource for training and evaluating coreference resolution systems for Spanish. From a linguistic point of view, the annotated corpus can be used as a workbench to test and validated hypotheses on coreferential expressions for Spanish. This corpus will be used in SemEval 2010 coreference resolution task.
Data stored in XML format
At present AnCora-CO-Es is the largest Spanish corpus annotated with coreference and freely available.
The development of AnCora-CO-Es has been funded by the following projects: PRAXEM (HUM2006-27378-E) and Lang2World (TIN2006-15265-C06-06) from the Spanish Ministry of Education and Science.
Recasens, M., M.A.Martí, M. Taulé (2008) First-mention Definites: More than Exceptional Cases, S. Featherson & S. Winkler (eds), Fruits: Process and Product in Empirical Linguistics. Berlin: de Gruyter.
Recasens, M. (2008) Towards Coreference Resolution for Catalan and Spanish. Master Thesis. Universitat de Barcelona.
Recasens, M., M. A. Martí i M. Taulé (2007) 'Where Anaphora and Coreference Meet. Annotation in the CESS-ECE Corpus'. Recent Advances in Natural language Processing. Borovets, Bulgaria