Monolingual, Multilingual and Distributed Information Retrieval
Systems for information retrieval (IR) are responsible for selecting and retrieving documents that are relevant to the information required by users. As a result these systems return a list of relevant documents, usually in the order of values that measure the validity of this document to answer the information needs of the user.
In the last decade, interest in developing systems for multilingual information retrieval (CLIR – Cross Lingual Information Retrieval), has grown dramatically (Grefenstette, 1998). A CLIR system is an information recovery system capable of operating on a collection of multilingual documents.
The search engines available on the Web or in large corporations are typically based on a single document base, a local copy of the other accessible collections. In any case, if not all documents are available in order to proceed to copy and index them in a centralized manner, this approach is no longer valid. Such is the case with large corporations which usually have large, widely-distributed collections, or the Internet, where most of the information is generated dynamically, the reason why it is not accessible using traditional search engines. This is the basic motivation of distributed information retrieval systems.
A Question Answering (QA) can be defined as a system that automatically finds concrete answers to user queries. These systems are very useful in cases where the user needs to know specific data and does not want to review all the documentation relatiedto the topic for that data.
Word Sense Disambiguation
Disambiguation (Word Sense Disambiguation, WSD) is the identification of the meaning of a word in a given context within a given set of candidates. Disambiguation is not an end in itself, but it is a very necessary intermediate task for some Natural Language Processing (NLP) Tasks.
Automated Text Categorization (ATC) involves the automatic classification of documents into predefined categories.
A Named Entity Recognition system (NER) tries to find within a text or document those simple sentences that directly respond to simple questions (who?, how?, where? …).
Multimodal information retrieval
Currently, there is a huge amount of unstructured information available online, on the public web and in the “hidden” web (intranets, digital libraries, etc..). This information can be both visual and textual, and found in all kinds of multimedia documents (video, images, audio, transcripts of conferences …). Information retrieval on such varied collections presents challenges like merging or indexing.
Opinion Mining aims to bring the principles of data mining (discovery of relationships, classes, etc..) to analysis of product reviews and reviews of blogs and other collaborative environments. It attempts to analyze the polarity in the opinion of the author of a comment in order to extract a review thereof. This discipline is of considerable interest in e-commerce systems, but its scope is much broader.
Recommender systems are oriented towards the consumer by suggesting products that may be of their interest. In our group we work to improve current collaborative filtering systems by adding analysis of the components of human language.