» COPOD

See full content »

Resource type:

Corpus

Description:

The Corpus Of Patient Opinions in Dutch (COPOD) has been built by crawling the well-known medical forum Zorgkaart Nederland on June 28, 2016. It is composed of 156,975 patient reviews about their experiences with physicians of 60 specialties. Each review contains a rating for different aspects (accommodation, appointment, therapy, staff attention, information and listening), on a scale from 1 to 10 stars, and an overall rating that corresponds to the average of the ratings of these aspects.

How to cite:

Jiménez-Zafra, S. M., Martín-Valdivia, M. T., Maks, I., & Izquierdo, R. (2017). Analysis of patient satisfaction in Dutch and Spanish online reviews. Procesamiento del Lenguaje Natural, 58, 101-108.

Files of the resource:

COPOD.zip

For any questions related to the corpus, please send an email to Salud María Jiménez Zafra or M. Teresa Martín-Valdivia.

» COPOS

See full content »

Resource type:

Corpus

Description:

This corpus was extracted by crawling the website www.masquemedicos.com. The generated corpus is a collection of patient opinions about medical entities that come from six countries(Chile, Colombia,Ecuador, Spain, Mexico, Venezuela). It is composed of 743 reviews about 34 medical specialities. There are 109 reviews negative and 634 reviews positive. The reviews are rated on a scale from 0 to 5 stars.

How to cite:

del Arco, F. M. P., Valdivia, M. T. M., Zafra, S. M. J., González, M. D. M., & Cámara, E. M. (2016). COPOS: Corpus Of Patient Opinions in Spanish. Application of Sentiment Analysis Techniques. Procesamiento del Lenguaje Natural, 57, 83-90.

For any questions related to the corpus, please send an email to M. Teresa Martín-Valdivia  or Flor Miriam Plaza-del-Arco.

» COST

See full content »

Resource type:

Corpora

Description:

Corpus of Spanish tweets for sentiment analysis. The corpus is composed by 34634 tweets, which are tagged with noisy labels. 17317 of the tweets are positive and 17317 tweets are negative, so it is a balanced corpus.

How to cite:

Martínez-Cámara, E., Martín-Valdivia, M. T., Ureña-López, L. A., Mitkov, R. (2015). Polarity classification for Spanish tweets using the COST corpus. Journal of Information Science, 41(3), 263-272. DOI: 10.1177%2F0165551514566564.

Resource files:

To get the corpus you have to write an email to Eugenio Martínez Cámara (emcamara@ujaen.es)

 

» CRiSOL

See full content »

Resource type:

Lexicon

Description:

CRiSOL is the result of the combination of two linguistic resources for Sentiment Analysis. One of those resources is iSOL, which is a list of opinion bearing words in Spanish. The other one is the widely known opinion lexicon SentiWordNet. The result has been the filtered version of SentiWordNet by means the words that are in iSOL. The iSOL and SentiWordNet information that are in CRiSOL can be used jointly or indepently.

CRiSOL is composed by 8135 words of iSOL, from which 4434 are also linked with their polarity score in SentiWordNet.

How to cite:

Molina González, M. Dolores, Martínez Cámara, Eugenio, & Martín Valdivia, M. Teresa. (2015). CRiSOL: Opinion Knowledge-base for Spanish. Procesamiento Del Lenguaje Natural, 55, 143-150.
http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/5226

Files of the resource:

crisol.tar.gz

» DOS

See full content »

Resource type:

Corpus

Description:

The Drug Opinions Spanish (DOS) corpus was sourced from the web portal https://www.mimedicamento.es, which is an independent platform for sharing experiences with drugs. It is composed of 877 opinions about the 30 most reviewed drugs by March 14, 2017. Each review contains information about the date in which it was posted, the gender and age of the consumer, the disease and the drug used for it, the textual opinion and a rating for the following satisfaction categories: overall, efficacy, side effects quantity, side effects severity and ease of use. Moreover, each review was manually annotated at aspect-level with the side effects described in them and with an opinion polarity label and an opinion intensity label according to the patients’ experiences. The corpus has 3,784 sentences containing a total of 2,230 side effects, out of which 98 are positive, 2,119 negative and 13 neutral. Regarding the intensity of the side effects, 655 are of high intensity, 1,486 of medium intensity and 89 of low intensity.

How to cite:

Jiménez-Zafra, S. M.,Martín-Valdivia, M. T., Molina-González, M. D. & Ureña-López, L. A. (2017). Corpus Annotation for Aspect Based Sentiment Analysis in Medical Domain. Proceedings of the 2nd International Workshop on Extraction and Processing of Rich Semantics from Medical Texts

Files of the resource:

DOS.zip

For any questions related to the corpus, please send an email to Salud María Jiménez-Zafra or M. Teresa Martín-Valdivia.

» Email SPAM ENRON Corpus

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

Spam filter with Naive Bayes

Related links:

» emoti-sp

See full content »

Resource type:

Lexicon

Description:

Linguistic resource for researching purposes in Sentiment Analysis on Spanish tweets. The lexicon is composed by 70 positive emoticons and 46 negative emoticons.

Files of the resource:

To download the resource you have to write an email to Salud M. Jiménez Zafra (sjzafra@ujaen.es) or Eugenio Martínez Cámara (emcamara@ujaen.es).

» eSOL

See full content »

Resource type:

Lexicon

Description:

iSOL is a list of domain-dependent opinion signal words in Spanish. The domain is the set of words of movie reviews.

The elaboration of the list was performed using a corpus-based approach. In this case it selected the Spanish Movie Reviews corpus. The list is composed of 2,535 positive words and 5,639 negative words. For more information on how the list was developed see the paper: Semantic Orientation for Polarity Classification in Spanish Reviews (In revision).

Molina-González M.D., Martínez-Cámara, E., Martín-Valdivia, M. T. & Perea-Ortega, J. M. (2012). Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications.
http://dx.doi.org/10.1016/j.eswa.2013.06.076

Resource files:

esol.tar.gz

» eSOLdomainGlobal

See full content »

Resource type:

Lexicon

Description:

One of the main problems in Opinion Analysis is generating resources adapted for a specific domain. eSOLdomainGlobal is a set of lists of opinion signal words in Spanish that cover 8 different domains: cars, hotels, washing machines, books, mobile phones, music, computers and movies. The lists have been generated from the lexicon ISOL, and using a corpus-based approach taking the Spanish version of the SFU Review Corpus 8 lists have been generated.

Words

Positive

Negative

Cars

2528

5648

Hotels

2517

5636

Washers

2520

5639

Books

2529

5651

Mobile

2529

5657

Music

2538

5645

Computers

2527

5644

Films

2535

5648

Resource files:

eSOLdomainGlobal.rar

» EVOCA Corpus

See full content »

Resource type:

Corpora

Description:

EVOCA (English Version of OCA)
is an English corpus generated from the translation of the Arabic corpus OCA. This corpus contains reviews of movies and is divided into 250 positive reviews and 250 negative. Some statistics on EVOCA corpus. This corpus was translated in April 2011. Some statistics on it are shown in the following table:

Negative Positive
Total documents 250 250
Total tokens 122.135 153.581
Average tokens in each comment 488,54 614,32
Total sentences 5.030 3.483
Average sentence in each comment 20,12 13,93

Rushdi Saleh, M., Martín-Valdivia, M. T., Ureña-López, L. A. & Perea-Ortega, J. M. (2011). Bilingual Experiments with an Arabic-English Corpus for Opinion Mining. Proceedings of Recent Advances in Natural Language Processing, pages 740–745.

For any questions on the corpus sends an email to Mohammed Saleh or José M. Perea

Resource files:

EVOCA-corpus.rar

» FIRE

See full content »

Resource type:

NLP and IR Software

Description:

Flexible Image Retrieval Engine. Given an image as a question, the goal is to find images in a database that are similar to the given image. GNU Public Licence

Resource link:

» FOIL

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

First Order Inductive Learner. Used to generate Rating Classification Association rules (CARs). Max three attributes in the antecedent of a rule

Resource link:

» Freeling

See full content »

Resource type:

NLP and IR Software

Description:

Library that provides services for the analysis of language. It can be used as an external library or through an interface that allows you to analyze files from the command line. Some features: text tokenization, sentence splitting, morphological analysis, detection and classification of entities, recognition of dates / numbers / money / proportions, PoS tagging, Chart-based shallow parsing, detecting physical parameters (speed, weight, temperature, density, etc.), sense annotation based on Wordnet. For Spanish, Catalan, Italian, Galician.

Resource link:

» GALib

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

C++ library to develop applications based on genetic algorithms. For Linux, MacOS and DOS/Windows. GPL License

Resource link:

» GATE

See full content »

Resource type:

NLP and IR Software

Description:

A platform for the development of IR systems and natural language processing. Very complete and with many modules. Based on Java. Used for all types of language processing tasks. LGPL License

Resource link: