Multilingual and cross-lingual topic detection and tracking
De RedTimmWiki
The speaker will present the multilingual news aggregation and analysis system NewsExplorer and will focus on two prominent functionalities of the fully-automatic online application:
- multilingual topic detection and tracking (TDT)
- cross-lingual topic tracking.
While there are known solutions to the linking of historically related news in the same language (topic tracking), the state-of-the-art in linking related documents across languages is restricted to using bilingual approaches (such as using Machine Translation, bilingual dictionaries or bilingual vector space representations).
When trying to link news across many languages, these bilingual approaches are not practical as the number of language pairs for n languages is (n^2-n)/2.
In NewsExplorer, an alternative approach has therefore been developed that makes use of monolingual analysis steps to produce a (nearly) language-independent content representation. This abstract representation allows to link related news across many languages in a quick and easy manner. In NewsExplorer, which to date analyses 19 languages, cross-lingual topic tracking is currently implemented and carried out daily for 10 languages (45 language pairs).
The speaker will explain the underlying technology and demonstrate the system live. NewsExplorer, which is publicly accessible at http://press.jrc.it/NewsExplorer, is a product of the Europe Media Monitor (EMM) family of applications, which furthermore include NewsBrief and the Medical Information System MedISys.
