Semantic TED

Multilingual Semantic Linking for Video Streams: Making “Ideas Worth Sharing” More Accessible

Semantic TEDThis paper describes our (winning!) submission to the Developers Challenge at WoLE2013, “Doing Good by Linking Entities.” We present a fully automatic system – called “Semantic TED” – which provides intelligent suggestions in the form of links to Wikipedia articles for video streams in multiple languages, based on the subtitles that accompany the visual content. The system is applied to online conference talks. In particular, we adapt a recently proposed semantic linking approach for streams of television broadcasts to facilitate generating contextual links while a TED talk is being viewed. TED is a highly popular global conference series covering many research domains; the publicly available talks have accumulated a total view count of over one billion at the time of writing. We exploit the multi-linguality of Wikipedia and the TED subtitles to provide contextual suggestions in the language of the user watching a video. In this way, a vast source of educational and intellectual content is disclosed to a broad audience that might otherwise experience difficulties interpreting it.

  • [PDF] D. Odijk, E. Meij, D. Graus, and T. Kenter, “Multilingual semantic linking for video streams: making “ideas worth sharing” more accessible,” in Proceedings of the 2nd international workshop on web of linked entities (wole 2013), 2013.
    [Bibtex]
    @inproceedings{WOLE:2013:Odijk,
    Author = {Odijk, Daan and Meij, Edgar and Graus, David and Kenter, Tom},
    Booktitle = {Proceedings of the 2nd International Workshop on Web of Linked Entities (WoLE 2013)},
    Date-Added = {2013-05-15 14:09:58 +0000},
    Date-Modified = {2013-05-15 14:11:37 +0000},
    Title = {Multilingual Semantic Linking for Video Streams: Making "Ideas Worth Sharing" More Accessible},
    Year = {2013}}
Example entity linking for tweets, to support tweets summarization

Personalized Time-Aware Tweets Summarization

To appear as full paper at SIGIR 2013.

In this paper we focus on selecting meaningful tweets given a user’s interests. Specifically, we consider the task of time-aware tweets summarization, based on a user’s history and collaborative social influences from “social circles.” Continue reading “Personalized Time-Aware Tweets Summarization” »

Overview of RepLab 2012: Evaluating Online Reputation Management Systems

This paper summarizes the goals, organization and results of the first RepLab competitive evaluation campaign for Online Reputation Management Systems (RepLab 2012). RepLab focused on the reputation of companies, and asked participant systems to annotate different types of information on tweets containing the names of several companies. Two tasks were proposed: a pro ling task, where tweets had to be annotated for relevance and polarity for reputation, and a monitoring task, where tweets had to be clustered thematically and clusters had to be ordered by priority (for reputation management purposes). The gold standard consisted of annotations made by reputation management experts, a feature which turns the RepLab 2012 test collection in a useful source not only to evaluate systems, but also to reach a better understanding of the notions of polarity and priority in the context of reputation management.

  • [PDF] E. Amigó, A. Corujo, J. Gonzalo, E. Meij, and M. de Rijke, “Overview of RepLab 2012: evaluating online reputation management systems,” in Clef (online working notes/labs/workshop), 2012.
    [Bibtex]
    @inproceedings{CLEF:2012:replab,
    Author = {Enrique Amig{\'o} and Adolfo Corujo and Julio Gonzalo and Edgar Meij and Maarten de Rijke},
    Booktitle = {CLEF (Online Working Notes/Labs/Workshop)},
    Date-Added = {2012-09-20 12:48:33 +0000},
    Date-Modified = {2012-10-30 09:30:49 +0000},
    Title = {Overview of {RepLab} 2012: Evaluating Online Reputation Management Systems},
    Year = {2012}}

Generating Pseudo Test Collections for Learning to Rank Scientific Articles

Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.

  • [PDF] R. Berendsen, M. Tsagkias, M. de Rijke, and E. Meij, “Generating pseudo test collections for learning to rank scientific articles,” in Information access evaluation. multilinguality, multimodality, and visual analytics – third international conference of the clef initiative, clef 2012, 2012.
    [Bibtex]
    @inproceedings{CLEF:2012:berendsen,
    Author = {Berendsen, Richard and Tsagkias, Manos and de Rijke, Maarten and Meij, Edgar},
    Booktitle = {Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics - Third International Conference of the CLEF Initiative, CLEF 2012},
    Date-Added = {2012-07-03 13:44:06 +0200},
    Date-Modified = {2012-10-30 08:37:52 +0000},
    Title = {Generating Pseudo Test Collections for Learning to Rank Scientific Articles},
    Year = {2012}}