Currently, there is a huge amount of unstructured information available online, on the public web and in the “hidden” web (intranets, digital libraries, etc..). This information can be both visual and textual, and found in all kinds of multimedia documents (video, images, audio, transcripts of conferences …). Information retrieval on such varied collections presents challenges like merging or indexing.