Corpora, Own resources, Resource

EVOCA Corpus

Resource type:

Corpora

Description:

EVOCA (English Version of OCA)
is an English corpus generated from the translation of the Arabic corpus OCA. This corpus contains reviews of movies and is divided into 250 positive reviews and 250 negative. Some statistics on EVOCA corpus. This corpus was translated in April 2011. Some statistics on it are shown in the following table:

Negative Positive
Total documents 250 250
Total tokens 122.135 153.581
Average tokens in each comment 488,54 614,32
Total sentences 5.030 3.483
Average sentence in each comment 20,12 13,93

Rushdi Saleh, M., Martín-Valdivia, M. T., Ureña-López, L. A. & Perea-Ortega, J. M. (2011). Bilingual Experiments with an Arabic-English Corpus for Opinion Mining. Proceedings of Recent Advances in Natural Language Processing, pages 740–745.

For any questions on the corpus sends an email to Mohammed Saleh or José M. Perea

Resource files:

EVOCA-corpus.rar