WaCOS: Watermarking Corpora Online System
|Contacto||David Eduardo Pinto Avendaño <dpintocs.buap.mx>|
The Watermarking Corpora On-line System (WaCOS) is made up of a set of measures for the assessment of text corpora.
WaCOS allows linguistics and computational linguistics researchers to study the following corpus features: domain broadness, shortness, class imbalance, stylometry and structure. WaCOS provides a friendly interface in order to easily evaluate corpora.
WaCOS front-end has been programmed with PHP. It integrates a set of modules written in different programming languages (C, C++, Java, AWK). Among the several components of this system, it uses n-gram language modelling, Zipf distribution of frequencies, density-based measures, internal clustering validity measures, etc in order to assess the relative hardness of a given corpus.
The end user is only required of an Internet browser in order to access the on-line system.
A freely available web-based tool which may be used to study peculiarities of textual corpus features.
Developed as part of David Pinto’s Ph.D. and the MiDES CICYT TIN2006-15265-C06-04 research project.
- David Pinto: On Clustering of Narrow Domain Short-Text Corpora. PhD Thesis, Universidad Politécnica de Valencia, Spain, July 2008.
- Diego Ingaramo, David Pinto, Paolo Rosso, Marcelo Errecalde: Evaluation of Internal Validity Measures in Short-Text Corpora. CICLing 2008. Lecture Notes in Computer Science 4919, Springer-Verlag: 555-567, 2008.
- Rafael Guzman, Manuel Montes, Paolo Rosso, Luis Villaseñor-Pineda and David Pinto: Semi-supervised Approach for WSD using the Web as Corpus. CICLing 2009. Lecture Notes in Computer Science, Springer-Verlag, 2009.