MCE Corpus

Resource type:

Corpora

Description:

MuchoCine corpus in English (MCE) is the translated version of the MuchoCine corpus (Spanish Movies Reviews). The MuchoCine corpus was developed by the researcher Fermín Cruz Mata and presented in 2008 at number 41 of the journal Natural Language Processing in the paper titled Document Classification based on Opinion: experiments with a corpus of Spanish cinema reviews.

This paper Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches checks the validity of a methodology for polarity classification in Spanish which consists of combining three classifiers, two of them supervised (on texts in English and another language) and an unsupervised classifier using some English language resource for sentiment analysis. This methodology was previously proposed for opinions in Arabic in the paper Improving Polarity Classification of Bilingual Parallel Corpora combining Machine Learning and Semantic Orientation approaches (in press).

The polarity of the documents of the corpus are measured on a scale of 1 to 5, with 1 being very bad and 5 very good. The details of the corpus are:

Polarity Number docs.
1 351
2 923
3 1253
4 890
5 461

 

The use of this corpus is only allowed for research. In this case, you must cite the following paper:

Martín-Valdivia, M. T., Martínez-Cámara, E., Perea-Ortega, J. M., & Alfonso Ureña-López, L. (2012). Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Systems with Applications.

http://dx.doi.org/10.1016/j.eswa.2012.12.084

For any questions about the corpus sends an email to José M. Perea or to Eugenio Martínez Cámara

Resource files:

MCE-corpus.tar.gz