Theses

BRUJA: A System for Multilingual Question Answering

Miguel Ángel García Cumbreras. May 2009

Within systems of natural language processing and information retrieval systems we find Question Answering. The search for answers can be defined as the automated process performed by computers to find concrete answers to specific questions asked by users.

Question Answering systems not only locate relevant documents or passages (within a document collection or unstructured information), but also find, extract, and show the response to the end user, saving them time searching or reading the relevant information in order to find the final answer manually.

The main components of a search for answers are:

- Analysis of the question
- Retrieval of documents or relevant passages
- Extraction of answers

Today there are systems designed to find answers to questions asked by the user using one single language for collections and any language for the question, so it is only necessary to apply one language translation, of the question into the language of the collections, in order to work in monolingual mode.

In this study a multilingual question answering system, called BRUJA (Búsqueda de Respuestas en la Universidad de Jaén, “search for answers at the University of Jaén”), has been researched and developed. The term “multilingual” is used in its entirety, or “clir” (cross language information retrieval). This generally involves accepting questions in any of the languages used, the use of collections in several languages and returning the response or final answer in the same language as that of the question.

Several possible solutions for the various modules have been investigated, developed and tested and then integrated into a final solution. The final version of the system works in three languages: English, Spanish and French, with possible expansion to other languages.

This research work and PhD thesis was awarded the rating of Excellent Cum Laude, and in 2010 was awarded the prize for the best doctoral thesis in the field of Natural Language Processing and Information Retrieval by the Spanish Society of Natural Language Processing and later published in full in a monograph.

(Link TESEO)

(Published as a SEPLN monograph and available here in PDF)