|Autores||M. Antònia Martí, Mariona Taulé, Lluís Màrquez and Manuel Bertran (CLiC-UB)|
|Contacto||M. Antònia Martí <amartiub.edu>|
AnCora-DEP-Ca is the AnCora-Ca multilevel annotated corpus of Catalan in dependency-based representation, consisting of 500,000 words approximately.
AnCora-DEP-Ca can be used as source of information for inducing grammars, developing, improving and/or evaluating syntactic parsers and algorithms for semantic role labelling, dependency-based. This corpus is used in the CoNLL Shared Task 2009: Syntactic and Semantic Dependencies in Multiple Languages, where the core of the task is to predict syntactic and semantic dependencies and their labelling.
Data stored in XML format
At present AnCora-DEP-Ca is the largest corpus multilevel annotated available in dependency format freely downloaded.
The development of AnCora-DEP-Ca has been funded by the following projects: CESS-ECE (HUM2004-21127) and Lang2World (TIN2006-15265-C06-06, and the funding given by the Catalan Secretary of Linguistic Policy.
Civit, M., M.A. Martí & N. Bufí (2006) ‘Cat3LB and Cast3LB: from Constituents to dependencies’, Springer Verlag, Advances in Natural Language Processing (LNAI, 4139), pp. 141-153. Berlin, ISSN: 0302-9743.