AnCora-CO-Es

De RedTimmWiki

AnCora-CO-Es
Autores M. Antònia Martí, Mariona Taulé, Marta Recasens, Lluís Màrquez and Manuel Bertran (CLiC-UB)
URL http://clic.ub.edu/ancora
Contacto Mariona Taulé <mtauleImagen:Arroba.jpgub.edu>


Description

AnCora-CO-Es is a subset of the multilevel annotated corpus AnCora-Es (for Spanish), consisting of 400,000 words, enriched with coreference information, where all noun phrases (NPs) –pronominal or with a nominal head– pointing to the same entity are linked.

Functionality

AnCora-CO-Es can be a useful resource for training and evaluating coreference resolution systems for Spanish. From a linguistic point of view, the annotated corpus can be used as a workbench to test and validated hypotheses on coreferential expressions for Spanish. This corpus will be used in SemEval 2010 coreference resolution task.

Technology

Data stored in XML format

Technical Requirements

-

Modules

-

Innovation

At present AnCora-CO-Es is the largest Spanish corpus annotated with coreference and freely available.

Development

The development of AnCora-CO-Es has been funded by the following projects: PRAXEM (HUM2006-27378-E) and Lang2World (TIN2006-15265-C06-06) from the Spanish Ministry of Education and Science.

Publications

Recasens, M., M.A.Martí, M. Taulé (2008) First-mention Definites: More than Exceptional Cases, S. Featherson & S. Winkler (eds), Fruits: Process and Product in Empirical Linguistics. Berlin: de Gruyter.

Recasens, M. (2008) Towards Coreference Resolution for Catalan and Spanish. Master Thesis. Universitat de Barcelona.

Recasens, M., M. A. Martí i M. Taulé (2007) 'Where Anaphora and Coreference Meet. Annotation in the CESS-ECE Corpus'. Recent Advances in Natural language Processing. Borovets, Bulgaria

Herramientas personales
Enlaces