» Gift

See full content »

Resource type:

NLP and IR Software

Description:

The GNU Image-Finding Tool. Information retrieval system based on content. Allows queries images allowing feedback of results by relevance. It has a tool for indexing the images in hierarchical directories

Resource link:

» Gift

See full content »

Resource type:

NLP and IR Software

Description:

The GNU Image-Finding Tool. Information retrieval system based on content. Allows queries images allowing feedback of results by relevance. It has a tool for indexing the images in hierarchical directories

Resource link:

» Gnuplot

See full content »

Resource type:

Mathematical Software

Description:

The classic program for visualization of scientific data. Similar to GPL License

Resource link:

» Google APIs

See full content »

Resource type:

NLP and IR Software

Description:

Google Web APIs. Methods for developers to make requests to Google from their own applications. Several development languages​​: Java, Perl, Visual Studio .NET, among others..

Resource link:

» GSL

See full content »

Resource type:

Mathemtatical Software

Descripción:

The GNU Scientific Library for computation. It provides a high range of mathematical routines and special functions. C and C++ languages. License GPL

Resource link:

» Hashtags-sp

See full content »

Resource type:

Lexicon

Description:

Linguistic resource for researching purposes in Sentiment Analysis on Spanish tweets. The lexicon is composed by 172 positive Twitter hashtags and 127 negative Twitter hashtags.

Files of the resource:

To download the resource you have to write an email to Salud M. Jiménez Zafra (sjzafra@ujaen.es) or Eugenio Martínez Cámara (emcamara@ujaen.es).

» HEP Collection

See full content »

Resource type:

Corpora

Description:

This corpus is oriented to the study of multi-label classifiers text. It consists of scientific papers in the field of High Energy Physics (HEP – High Energy Physics) obtained by the CDS document server of European Nuclear Physics Laboratory (CERN). The corpus is divided into three subsets (called partitions), where each partition consists in two files: one containing the records of each item (with information such as the abstract, authors and, of course, classes or key words) in compressed XML format, and other that contains a plain text version of the complete paper generated from the PDF available at CERN databases (tar + gzip format). Classes are defined by the XML mark KEYWORD. These are the labels manually assigned from thesaurus DESY. You can get more information about the thesaurus DESY.

  • Partition hepth: 18,114 Theoretical Physics documents (metadata – 5,3 Mb) (papers – 226 Mb)
  • Partition hepex: 2,599 Experimental Physics documents(metadata – 1,6 Mb) (papers – 28 Mb)
  • Partition astroph: 2,716 Astrophysics documents (metadata – 1,1 Mb) (papers – 29 Mb)

Updated on 23.04.2007: Thanks to Ioannis Katakis, from Aristotle University of Thessaloniki, (Greece) por corregir algunos problemas en el XML proporcionado. How to reference This corpus has been prepared by Arturo Montejo Ráez with metadata supplied by Jens Vigen and CDS Support Team. For references use:

@Article{montejo2004,
  author =        {Montejo-Ráez, A. and Steinberger, R. and Ureña-López,  L. A.}
  title =            {Adaptive selection of base classifiers in one-against-all
                      learning for large multi-labeled collections},
  booktitle =     {Advances in Natural Language Processing: 4th International
                      Conference, EsTAL 2004},
  pages =        {1--12},
  year =           {2004},
  editor =         {Vicedo J. L. et al.},
  location =      {Alicante, Spain},
  number =      {3230},
  series =        {Lectures notes in artifial intelligence},
  publisher =    {Springer}
}

Resource files

hep-collection.rar

» Illios Wikiflier

See full content »

Resource type:

NLP and IR Software

Description:

NER that identifies, mark and links with Wikipedia entities. Source code available. Own license, but free

Resource link:

» Indri

See full content »

Resource type:

NLP and IR Software

Description:

Recovery engine based on Lemur. It also retrieves passages. BSD License

Resource link:

» iSOL

See full content »

Resource type:

Lexicon

Description:

iSOL is a list of domain independent opinion signal words in Spanish.

For the elaboration of the resource it has begun from the list of words that the professors Bing Liu maintains (Bing Liu’s Opinion Lexicon). The word list has been automatically translated using the Reverso translator and subsequently corrected manually.

The list consists of 2,509 positive and 5,626 negative words. For more information on how the list was developed see the paper: Semantic Orientation for Polarity Classification in Spanish Reviews.

Reference

If you use iSOL, please, cite the following paper:

Molina-González, M. D., Martínez-Cámara, E., Martín-Valdivia, M. T., & Perea-Ortega, J. M. (2013). Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications, 40(18), 7250-7257.

Files of the resource:

isol.tar.gz

» JBNC

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

Toolkit for developing classifiers based on Bayesian networks. Classifiers: Naive Bayes, TAN, FAN, STAN, STAND, SFAN. GPL License

Resource link:

» JRE-JDK

See full content »

Resource type:

NLP and IR Software

Description:

Java Runtime Environment – Java Delevopment Kit. JRE contains the java virtual machine, runtime class libraries and application required to write programs in Java. JDK is a development environment for building applications, applets and components in the Java programming language. License GPL

Resource type:

» KEA

See full content »

Resource type:

Machine Learning and Data Mining Software

Description:

Phrases and keywords extractor. Large collections of documents. Implemented in Java, platform independent. GPL License

Resource link:

» Lemur

See full content »

Resource type:

NLP and IR Software

Description:

Tools for language modeling and information retrieval. It is written in C and C++ languages. It runs on Unix operating system, but it can also runs in Windows systems. Free use license.

Resource link:

» Lexical Tools

See full content »

Resource type:

NLP and IR Software

Description:

Pack of linguistic resources from the National Library of Medicine. It is developed in Java 1.5 with the integration of SQL data base HyperSonic. Freeware

Resource link: