» LingPipe

See full content »

Resource type:

NLP and IR Software

Description:

Java Toolkit for natural language processing, especially entity recognition. LingPipe architecture is designed for efficient, scalable, reusable and robust. License Alias​​-i Royalty free

Resource link:

» Linuga CPAN Modules

See full content »

Resource type:

NLP and IR Software

Description:

Perl modules for entity recognition, dictionaries, taggers … For Unix, Windows, Macintosh, DOS, OS / 2, VMS, MVS. License GPL

Resource link:

» LVQ_PAK

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

Package that contains the programs necessary for the proper use of LVQ. Implementation of LVQ neural network, which uses both supervised and unsupervised learning for pattern classification. For Windows and Unix. unknown License

Resource link:

» MCE Corpus

See full content »

Resource type:

Corpora

Description:

MuchoCine corpus in English (MCE) is the translated version of the MuchoCine corpus (Spanish Movies Reviews). The MuchoCine corpus was developed by the researcher Fermín Cruz Mata and presented in 2008 at number 41 of the journal Natural Language Processing in the paper titled Document Classification based on Opinion: experiments with a corpus of Spanish cinema reviews.

This paper Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches checks the validity of a methodology for polarity classification in Spanish which consists of combining three classifiers, two of them supervised (on texts in English and another language) and an unsupervised classifier using some English language resource for sentiment analysis. This methodology was previously proposed for opinions in Arabic in the paper Improving Polarity Classification of Bilingual Parallel Corpora combining Machine Learning and Semantic Orientation approaches (in press).

The polarity of the documents of the corpus are measured on a scale of 1 to 5, with 1 being very bad and 5 very good. The details of the corpus are:

Polarity Number docs.
1 351
2 923
3 1253
4 890
5 461

 

The use of this corpus is only allowed for research. In this case, you must cite the following paper:

Martín-Valdivia, M. T., Martínez-Cámara, E., Perea-Ortega, J. M., & Alfonso Ureña-López, L. (2012). Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Systems with Applications.
http://dx.doi.org/10.1016/j.eswa.2012.12.084

For any questions about the corpus sends an email to José M. Perea or to Eugenio Martínez Cámara

Resource files:

MCE-corpus.tar.gz

» MDC

See full content »

Resource type:

NLP and IR Software

Description:

Multi Dimensional Clustering. Project that addresses the design and implementation of a new physical scheme of arrangement of data in version 8 DB2 database. It is universal for a multi-dimensional access paradigm.

Resource link:

» MeSH

See full content »

Resource type:

NLP and IR Software

Description:

Medical Subject Headings. Key tool to search for information in the Medline database. Controlled vocabulary that used Medline and other biomedical databases. It consists of more than 33,000 terms sorted in hierarchical structures called trees

Resource link:

» MG System

See full content »

Resource type:

NLP and IR Software

Description:

IR System developed by “Managing gigabytes” authors.

Resource link:

» Mifluz

See full content »

Resource type:

NLP and IR Software

Description:

C++ library to develop text inverted index. Is dynamically updatable, scalable, uses a controlled amount of memory. GPL License

Resource link:

» Minipar

See full content »

Resource type:

NLP and IR Software

Description:

Comprehensive coverage parser for English. Available for Linux, Solaris, WINDOWS95/98. Free for non-commercial use

Resource link:

» MLC++

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

C++ library for supervised machine learning. The main objective is to provide tools that can help data, accelerate the development of new mining algorithms, increasing software reliability, provide comparison tools and display information visually. Free license for non-commercial use only

Resource link:

» MPEG-7 XM

See full content »

Resource type:

NLP and IR Software

Description:

Standard representation of audiovisual information that allows the description of contents. The XM is the software simulation platform for MPEG-7 descriptors (Ds), description schemes (DSs) coding schemes (CSs) and Description Definition Language (DDL).

» MySVM

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

Implementation of Support Vector Machines. Used for pattern recognition, regression and estimation. For Windows and Unix. Free for non-commercial use

Resource link:

» NIST Sparse BLAS

See full content »

Resource type:

Mathematical Software

Description:

C library for linear algebra with sparse matrices. License undetermined

Resource link:

» OCA Corpus

See full content »

Resource type:

Corpora

Description:

OCA is an Arabic corpus of movie reviews. This corpus has been generated from comments in Arabic obtained from different web pages shown in the following table:

Name Webpage Vote system Positive Negative
Cinema Al Rasid http://cinema.al-rasid.com/ 10 36 1
Film Reader http://filmreader.blogspot.com/ 5 0 92
Hot Movie Reviews http://hotmoviews.blogspot.com 5 45 4
Elcinema http://www.elcinema.com 10 0 56
Grind House http://grindh.com 10 38 0
Mzyondubai http://www.mzyondubai.com 10 0 15
Aflamee http://aflamee.com 5 0 1
Grind Film http://grindfilm.blogspot.com/ 10 0 8
Cinema Gate http://www.cingate.net Bad/Good 0 1
Emad Ozery Blog http://emadozery.blogspot.com 10 0 1
Fil Fan http://www.filfan.com 5 81 20
Sport4Ever http://sport4ever.maktoob.com 10 0 1
DVD4ArabPos http://dvd4arab.maktoob.com 10 11 0
Gamraii http://www.gamraii.com 10 39 0
Shadows and Phantoms http://shadowsandphantoms.blogspot.com 10 0 50
Total 250 250

Some statistics of OCA corpus: This corpus was generated in October 2010 Some statistics on it are shown in the following table.:

Negative Positive
Total documents 250 250
Total tokens 94,556 121,392
Average tokens on each comment 378 485
Total sentences 4,881 3,137
Average sentences on each comment 20 13

Rushdi-Saleh, M., Martín-Valdivia, M. T., Alfonso Ureña-López, L. & Perea-Ortega, J. M. (2011). OCA: Opinion corpus for Arabic. Journal of the American Society for Information Science and Technology.
http://dx.doi.org/10.1002/asi.21598

For any questions on the corpus sends an email to Mohammed Saleh or José M. Perea

Resource files:

OCA-corpus.zip

» OCTAVE

See full content »

Resource type:

Mathematical Software

Descripción:

High-level language for numerical computation. Provides command line interface for solving linear and nonlinear problems, and for performing other numerical experiments mostly in languages compatible with Matlab. The user can define functions written in Octave’s own language, or dynamically load modules written in C++, C, Fortran, or other languages​​. GPL License

Resource link: