OCA is an Arabic corpus of movie reviews. This corpus has been generated from comments in Arabic obtained from different web pages shown in the following table:
|Cinema Al Rasid||http://cinema.al-rasid.com/||10||36||1|
|Hot Movie Reviews||http://hotmoviews.blogspot.com||5||45||4|
|Emad Ozery Blog||http://emadozery.blogspot.com||10||0||1|
|Shadows and Phantoms||http://shadowsandphantoms.blogspot.com||10||0||50|
Some statistics of OCA corpus: This corpus was generated in October 2010 Some statistics on it are shown in the following table.:
|Average tokens on each comment||378||485|
|Average sentences on each comment||20||13|
Rushdi-Saleh, M., Martín-Valdivia, M. T., Alfonso Ureña-López, L. & Perea-Ortega, J. M. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology.
This corpus is an extension of the SFU Spanish Review Corpus (Brooke et al., 2009) with annotations about negation and its scope. It is a collection of 400 reviews of cars, hotels, washing machines, books, cell phones, music, computers and movies from the Ciao.es website. Each domain contains 25 positive and 25 negative reviews. Each review has been annotated at the token level with the lemma and the PoS and at the sentence level with negative keywords, their linguistic scope, the event and how the polarity of the sentence is affected by negation (if there is a change in the polarity or an increment or reduction of its value), also taking into account intensifiers and diminishers.
How to cite:
Jiménez-Zafra, S. M., Taulé, M., Martín-Valdivia, M. T., Ureña-López, L. A., & Martí, M. A. (2018). SFU Review SP-NEG: a Spanish corpus annotated with negation for sentiment analysis. A typology of negation patterns. Language Resources and Evaluation, 52(2), 533-569.
Jiménez-Zafra, S. M., Martín-Valdivia, M. T., Molina-González, M. D., & Ureña-López, L. A. (2018). Relevance of the SFU Review SP-NEG corpus annotated with the scope of negation for supervised polarity classification in Spanish. Information Processing & Management, 54(2), 240-251.
Jiménez-Zafra, S. M., Martin, M., Lopez, L. A. U., Marti, T., & Taulé, M. (2016). Problematic cases in the annotation of negation in Spanish. In Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics (ExProM) (pp. 42-48).
Martí, M. A., Martín-Valdivia, M. T., Taulé, M., Jiménez-Zafra, S. M., Nofre, M., & Marsó, L. (2016). La negación en español: análisis y tipología de patrones de negación. Procesamiento del Lenguaje Natural, 57, 41-48.
Files of the resource:
This corpus has been prepared by the SINAI group in December 2008. SINAI SA (Sentiment Analysis) was created by tracking the Amazon website. Nearly 2,000 comments were extracted from different cameras.
Structure: The SINAI corpus containing 5 directories and each represents the number of stars for reviews. (eg directory 1 contains rated with a star). Each directory contains a file in plain text by document/comment.
The amount of comments is as follows:
- 1…star: 78 comments
- 2…stars: 67 comments
- 3…stars: 97 comments
- 4…stars: 411 comments
- 5…stars: 1,290 comments
Total: 1,943 comments
Rushdi-Saleh, M., Martín-Valdivia, M. T., Montejo-Ráez, A., & Alfonso Ureña-López, L. (2011). Experiments with SVM to classify opinions in different domains. Expert Systems with Applications.
SOL is a list of opinion signal words in Spanish independent of the domain.
For the elaboration of the resource it has begun with the list of words that maintains the professor Bing Liu (Bing Liu’s Opinion Lexicon). The list of words has been automatically translated using the translator Reverso.
The list consists of 1,397 positive and 3,151 negative words. For more information on how the list was developed see the article: Bilingual Experiments on an Opinion Comparable Corpus (in press).
Martínez-Cámara, E., Martín-Valdivia, M. T., Molina-Gonzalez, M. L. & Alfonso Ureña-López, L. (2013). Bilingual Experiments on an Opinion Comparable Corpus. Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
This resource are 6305 spanish tagged questions for question answering classification, following the taxonomy defined in the paper “X. Li and D. Roth. Learning Question Classifiers”, and having the following general and detailed categories:
- ABBR: abbreviation, expansion
- DESC: definition, description, manner, reason
- ENTY: animal, body, color, creation, currency, disease/medical, event, food, instrument, language, letter, other, plant, product, religion, sport, substance, symbol, technique, term, vehicle, word
- HUM: description, group, individual, title
- LOC: city, country, mountain, other, state
- NUM code, count, date, distance, money, order, other, percent, period, speed, temperature, size, weight
Starting from a set of labeled questions for English, it has created this resource with various questions in Spanish labeled and checked by 3 people.
García-Cumbreras, M. A., Ureña-López, L. A. & Martínez-Santiago, F. (2006). BRUJA: Question Classification for Spanish. Using Machine Translation and an English Classifier. EACL 2006 Workshop on Multilingual Question Answering – MLQA06.
Tecat represents text categorization. It is a tool for building multi-label automatic text classifiers. With Tecat you can experiment with different collections and classifiers in order to build a multi-label classifier.
Montejo-Ráez A., Ureña-López, L. A., Steinberger, R. Adaptive Selection of Base Classifiers in One-Against-All Learning for Large Multi-labeled Collections. Lecture Notes in Computer Science Volume 3230, 2004, pp 1-12.
Please send an email to amontejo AT ujaen point es notifying its use.