Theses

Resolution of lexical ambiguity by learning vector quantization

Manuel García Vega. December 2006

Abstract:
Word Sense Disambiguation is the problem of assigning a specific meaning to a polysemous word, using context. This problem has been of interest almost from the beginning of computing in the 50s. Disambiguation is an intermediate task and not an end in itself. In particular, it is very useful, sometimes necessary, for many NLP problems such as information retrieval, text categorization, automatic translation, etc.

The goal of this thesis is to implement a meaning tagger of words based on the Vector Space Model, optimizing the weights of the training vectors using neural network LVQ (Learning Vector Quantization) of the Kohonen supervised neural model, and to propose a uniform method of integration of the resources that serve to train the network. The LVQ network parameters have been optimized for the problem of disambiguation.

This work has shown that neural networks, specifically Kohonen models, solved the problem of lexical ambiguity resolution brilliantly, providing robustness because the LVQ network is insensitive to small changes and consistent results were observed regardless of the training; flexibility, because they are easily applicable to any PLN task; scalability, because many different training texts can be introduced to suit any domain; and effectiveness, because the results obtained are comparable and in many cases outperform traditional methods used to solve the same problems.

The SemCor corpus and WordNet lexical database have been integrated. This has also provided a method for the automatic integration of any corpus. Experiments show the good performance of this network for the specific problem of disambiguation.

(Link TESEO)