Resource type
Corpora
Description
COAH is a corpora of hotel reviews for polarity classification tasks at document level. The corpus is composed by 1816 reviews from TripAdvisor, which are scored on a scale from 1 (negative) to 5 (positive). The number of opinions per each class is:
Number of opinions | 1816 |
Number of tokens | 272446 |
Number of words | 239749 |
Number of unique words | 154297 |
Lexical diversity | 0,6435 |
Number of characters | 1372737 |
Number of characters without whitespaces | 1135306 |
Number of nouns | 55530 |
Number of verbs | 40318 |
Number of adjectives | 19935 |
Number of adverbs | 16629 |
Number of lemmas | 239749 |
Número de lemas únicos | 138549 |
Lemmas diversity | 0,577 |
Number of senses | 106205 |
Number of unique senses | 77397 |
Mean length of sentences | 23,245 |
Mean of nouns | 0,231 |
Mean of verbs | 0,168 |
Mean of adjectives | 0.083 |
Mean of adverbs | 0.069 |
How to cite
Molina-González, M. D., Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A. (2014). Cross-domain sentiment analysis using spanish opinionated words. Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 8455, pp. 214-219. Springer International Publishing. DOI: 10.1007/978-3-319-07983-7_28
For any questions on the corpus sends an email to M. Dolores Molina or Eugenio Martínez
Enlace