» 20-Newsgroups

See full content »

Resource type:

Corpora

Description:

20000 messages taken from 20 Usenet newsgroups. Available for scientific use.

Resource link:

» AGFL

See full content »

Resource type:

NLP and IR Software

Description:

System for natural languagegrammar development and automatic generation of efficient analyzers for these grammars. Available for Windows and Unix. GNU GPL License.

Resource link:

» Apertium

See full content »

Resource type:

NLP and IR Software

Description:

Open source automatic Translator for Spanish state languages. For 32-bit MS Windows (95/98/NT/2000/XP), POSIX (Linux / BSD / Unix OSes). GPL License.

Resource link:

» BabelNet

See full content »

Resource type:

PLN and IR software

Description:

BabelNet is both a multilingual encyclopedic dictionary, with lexicographic and encyclopedic coverage of terms in 50 languages, and a semantic network which connects concepts and named entities in a very large network of semantic relations, made up of more than 9 million entries.

Resource link:

» Bayesian Logistic Regression Software

See full content »

Resource type:

Machine Learning y Data Mining Software

Description:

This software implements Bayesian Logistic Regression with two options: Gaussian and Laplace (also known as double exponential). Free for non-commercial use. Available for Windows and Linux

Resource link:

» Bayesian Multinomial Regression Software

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

This software implements Bayesian Multinomial Logistic Regression. Free for non-commercial use. Available for Windows and Linux

Resource link:

» BoosTexter

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

Text classifier based on boosting. It can handle: Multiple attributes that can be textual, discrete or continuous, data with missing attributes, multiclass problems and large clean sets of data. Free license for non-commercial use only.

Resource link:

» BOW

See full content »

Resource type:

NLP and IR Software

Description:

C library for modeling, Information Retrieval and Text Classification. For Unix and WindowsNT. LGPL License.

Resource link:

» CCG-NER

See full content »

Resource type:

NLP and IR Software

Description:

Entity Name Tagging. Package incorporating versions of SNoW (network classifiers) and FEX, together with a module inference. The result is a robust system with good performance on new data. Free license for academic and research use.

Resource link:

» COAH

See full content »

Resource type:

Corpora

Description:

COAH is a corpora of hotel reviews for polarity classification tasks at document level. The corpus is composed by 1816 reviews from TripAdvisor, which are scored on a scale from 1 (negative) to 5 (positive). The number of opinions per each class is:

Rating 1 2 3 4 5 Total
#Opinions 312 199 285 489 531 1816

Some linguistic features of the corpora are:

Number of opinions 1816
Number of tokens 272446
Number of words 239749
Number of unique words 154297
Lexical diversity 0,6435
Number of characters 1372737
Number of characters without whitespaces 1135306
Number of nouns 55530
Number of verbs 40318
Number of adjectives 19935
Number of adverbs 16629
Number of lemmas 239749
Número de lemas únicos 138549
Lemmas diversity 0,577
Number of senses 106205
Number of unique senses 77397
Mean length of sentences 23,245
Mean of nouns 0,231
Mean of verbs 0,168
Mean of adjectives 0.083
Mean of adverbs 0.069

How to cite:

Molina-González, M. D., Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A. (2014). Cross-domain sentiment analysis using spanish opinionated words. Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 8455, pp. 214-219. Springer International Publishing. DOI: 10.1007/978-3-319-07983-7_28

Files of the resource:

corpus_coah.xml

For any questions on the corpus sends an email to M. Dolores Molina or Eugenio Martínez

» COAR

See full content »

Resource type:

Corpora

Description:

COAR is a corpora of restaurants reviews for polarity classification tasks at document level. The corpus is composed by 2202 reviews from TripAdvisor, which are scored on a scale from 1 (negative) to 5 (positive). The number of opinions per each class is:

Rating 1 2 3 4 5 Total
#Opinions 565 246 188 333 870 2202

Files of the resource:

CorpusCOAR.xlsx

For any questions on the corpus sends an email to M. Dolores Molina or Eugenio Martínez

» Collins Parser

See full content »

Resource type:

NLP and IR Software

Description:

Natural language parser. GNU License

Resource link:

» Collins Parser

See full content »

Resource type:

NLP and IR Software

Description:

Natural language parser. GNU License

Resource link:

» CoolTran

See full content »

Resource type:

NLP and IR Software

Description:

Multiplatform terms translator in different languages. It has several preinstalled language dictionaries, but more can be installed, as well as a “collaborative” Internet database, to which the application connects. Implementation in Java. GPL License.

Resource link:

» COPOD

See full content »

Resource type:

Corpus

Description:

The Corpus Of Patient Opinions in Dutch (COPOD) has been built by crawling the well-known medical forum Zorgkaart Nederland on June 28, 2016. It is composed of 156,975 patient reviews about their experiences with physicians of 60 specialties. Each review contains a rating for different aspects (accommodation, appointment, therapy, staff attention, information and listening), on a scale from 1 to 10 stars, and an overall rating that corresponds to the average of the ratings of these aspects.

How to cite:

Jiménez-Zafra, S. M., Martín-Valdivia, M. T., Maks, I., & Izquierdo, R. (2017). Analysis of patient satisfaction in Dutch and Spanish online reviews. Procesamiento del Lenguaje Natural, 58, 101-108.

Files of the resource:

COPOD.zip

For any questions related to the corpus, please send an email to Salud María Jiménez Zafra or M. Teresa Martín-Valdivia.