COAH

Resource type:

Corpora

Description:

COAH is a corpora of hotel reviews for polarity classification tasks at document level. The corpus is composed by 1816 reviews from TripAdvisor, which are scored on a scale from 1 (negative) to 5 (positive). The number of opinions per each class is:

Rating 1 2 3 4 5 Total
#Opinions 312 199 285 489 531 1816

Some linguistic features of the corpora are:

Number of opinions 1816
Number of tokens 272446
Number of words 239749
Number of unique words 154297
Lexical diversity 0,6435
Number of characters 1372737
Number of characters without whitespaces 1135306
Number of nouns 55530
Number of verbs 40318
Number of adjectives 19935
Number of adverbs 16629
Number of lemmas 239749
Número de lemas únicos 138549
Lemmas diversity 0,577
Number of senses 106205
Number of unique senses 77397
Mean length of sentences 23,245
Mean of nouns 0,231
Mean of verbs 0,168
Mean of adjectives 0.083
Mean of adverbs 0.069

How to cite:

Molina-González, M. D., Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A. (2014). Cross-domain sentiment analysis using spanish opinionated words. Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 8455, pp. 214-219. Springer International Publishing. DOI: 10.1007/978-3-319-07983-7_28

Files of the resource:

corpus_coah.xml

For any questions on the corpus sends an email to M. Dolores Molina or Eugenio Martínez