MuchoCine corpus in English (MCE) is the translated version of the MuchoCine corpus (Spanish Movies Reviews). The MuchoCine corpus was developed by the researcher Fermín Cruz Mata and presented in 2008 at number 41 of the journal Natural Language Processing in the paper titled Document Classification based on Opinion: experiments with a corpus of Spanish cinema reviews.
This paper Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches checks the validity of a methodology for polarity classification in Spanish which consists of combining three classifiers, two of them supervised (on texts in English and another language) and an unsupervised classifier using some English language resource for sentiment analysis. This methodology was previously proposed for opinions in Arabic in the paper Improving Polarity Classification of Bilingual Parallel Corpora combining Machine Learning and Semantic Orientation approaches (in press).
The polarity of the documents of the corpus are measured on a scale of 1 to 5, with 1 being very bad and 5 very good. The details of the corpus are:
Polarity | Number docs. |
---|---|
1 | 351 |
2 | 923 |
3 | 1253 |
4 | 890 |
5 | 461 |
Solo se permite el uso de este corpus para investigación.
Martín-Valdivia, M. T., Martínez-Cámara, E., Perea-Ortega, J. M., & Alfonso Ureña-López, L. (2012). Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Systems with Applications.
http://dx.doi.org/10.1016/j.eswa.2012.12.084
Para cualquier consulta sobre el corpus envía un email a José M. Perea o a Eugenio Martínez Cámara