WaCOS: Watermarking Corpora Online System

De RedTimmWiki

WaCOS: Watermarking Corpora Online System
Autores David Pinto
URL http://www.dsic.upv.es/grupos/nle, http://nlp.dsic.upv.es:8080/watermarker,

http://nlp.cs.buap.mx/watermarker

Contacto David Eduardo Pinto Avendaño <dpintoImagen:Arroba.jpgcs.buap.mx>


Description

The Watermarking Corpora On-line System (WaCOS) is made up of a set of measures for the assessment of text corpora.

Funtionality

WaCOS allows linguistics and computational linguistics researchers to study the following corpus features: domain broadness, shortness, class imbalance, stylometry and structure. WaCOS provides a friendly interface in order to easily evaluate corpora.

Technology

WaCOS front-end has been programmed with PHP. It integrates a set of modules written in different programming languages (C, C++, Java, AWK). Among the several components of this system, it uses n-gram language modelling, Zipf distribution of frequencies, density-based measures, internal clustering validity measures, etc in order to assess the relative hardness of a given corpus.

Technical Requirements

The end user is only required of an Internet browser in order to access the on-line system.

Modules

-

Innovation

A freely available web-based tool which may be used to study peculiarities of textual corpus features.

Development

Developed as part of David Pinto’s Ph.D. and the MiDES CICYT TIN2006-15265-C06-04 research project.

Publications

  • David Pinto: On Clustering of Narrow Domain Short-Text Corpora. PhD Thesis, Universidad Politécnica de Valencia, Spain, July 2008.
  • Diego Ingaramo, David Pinto, Paolo Rosso, Marcelo Errecalde: Evaluation of Internal Validity Measures in Short-Text Corpora. CICLing 2008. Lecture Notes in Computer Science 4919, Springer-Verlag: 555-567, 2008.
  • Rafael Guzman, Manuel Montes, Paolo Rosso, Luis Villaseñor-Pineda and David Pinto: Semi-supervised Approach for WSD using the Web as Corpus. CICLing 2009. Lecture Notes in Computer Science, Springer-Verlag, 2009.
Herramientas personales
Enlaces