Dynamic query modeling for related content finding

While watching television, people increasingly consume additional content related to what they are watching. We consider the task of finding video content related to a live television broadcast for which we leverage the textual stream of subtitles associated with the broadcast. We model this task as a Markov decision process and propose a method that uses reinforcement learning to directly optimize the retrieval effectiveness of queries generated from the stream of subtitles. Our dynamic query modeling approach significantly outperforms state-of-the-art baselines for stationary query modeling and for text-based retrieval in a television setting. In particular we find that carefully weighting terms and decaying these weights based on recency significantly improves effectiveness. Moreover, our method is highly efficient and can be used in a live television setting, i.e., in near real time.

  • [PDF] D. Odijk, E. Meij, I. Sijaranamual, and M. de Rijke, “Dynamic query modeling for related content finding,” in SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval, 2015.
    [Bibtex]
    @inproceedings{SIGIR:2015:Odijk,
    Author = {Odijk, Daan and Meij, Edgar and Sijaranamual, Isaac and de Rijke, Maarten},
    Booktitle = {{SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval}},
    Date-Added = {2015-08-06 13:14:13 +0000},
    Date-Modified = {2015-08-06 13:39:24 +0000},
    Month = {August},
    Publisher = {ACM},
    Title = {Dynamic query modeling for related content finding},
    Year = {2015}}

Entity Linking and Retrieval for Semantic Search (WSDM 2014)

This morning, we presented the last edition of our tutorial series on Entity Linking and Retrieval, entitled “Entity Linking and Retrieval for Semantic Search” (with Krisztian Balog and Daan Odijk) at WSDM 2014! This final edition of the series builds upon our earlier tutorials at WWW 2013 and SIGIR 2013. The focus of this edition lies on the practical applications of Entity Linking and Retrieval, in particular for semantic search: more and more search engine users are expecting direct answers to their information needs (rather than just documents). Semantic search and its recent applications are enabling search engines to organize their wealth of information around entities. Entity linking and retrieval is at the basis of these developments, providing the building stones for organizing the web of entities.

This tutorial aims to cover all facets of semantic search from a unified point of view and connect real-world applications with results from scientific publications. We provide a comprehensive overview of entity linking and retrieval in the context of semantic search and thoroughly explore techniques for query understanding, entity-based retrieval and ranking on unstructured text, structured knowledge repositories, and a mixture of these. We point out the connections between published approaches and applications, and provide hands-on examples on real-world use cases and datasets.

As before, all our tutorial materials are available for free online, see http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/.

TREC KBA logo

Hadoop code for TREC KBA

I’ve decided to put some of the Hadoop code I developed for the TREC KBA task online. It’s available on Github: https://github.com/ejmeij/trec-kba. In particular, it provides classes to read/write topic files, read/write run files, and expose the documents in the Thrift files as Hadoop-readable objects (‘ThriftFileInputFormat’) to be used as input to mappers. I obviously also implemented a toy KBA system on Hadoop :-). See Github for more info.

INEX

A Generative Language Modeling Approach for Ranking Entities

We describe our participation in the INEX 2008 Entity Ranking track. We develop a generative language modeling approach for the entity ranking and list completion tasks. Our framework comprises the following components: (i) entity and (ii) query language models, (iii) entity prior, (iv) the probability of an entity for a given category, and (v) the probability of an entity given another entity. We explore various ways of estimating these components, and report on our results. We find that improving the estimation of these components has very positive effects on performance, yet, there is room for further improvements.

  • [PDF] W. Weerkamp, K. Balog, and E. Meij, “A generative language modeling approach for ranking entities,” in Advances in focused retrieval, 2009.
    [Bibtex]
    @inproceedings{INEX:2008:weerkamp,
    Abstract = {We describe our participation in the INEX 2008 Entity Ranking track. We develop a generative language modeling approach for the entity ranking and list completion tasks. Our framework comprises the following components: (i) entity and (ii) query language models, (iii) entity prior, (iv) the probability of an entity for a given category, and (v) the probability of an entity given another entity. We explore various ways of estimating these components, and report on our results. We find that improving the estimation of these components has very positive effects on performance, yet, there is room for further improvements.},
    Author = {Weerkamp, W. and Balog, K. and Meij, E.},
    Booktitle = {Advances in Focused Retrieval},
    Date-Added = {2011-10-16 12:29:08 +0200},
    Date-Modified = {2011-10-16 12:29:08 +0200},
    Organization = {Springer},
    Publisher = {Springer},
    Title = {A Generative Language Modeling Approach for Ranking Entities},
    Year = {2009}}
INEX

The University of Amsterdam (Ilps) at Inex 2008

We describe our participation in the INEX 2008 Entity Ranking and Link-the-Wiki tracks. We provide a detailed account of the ideas underlying our approaches to these tasks. For the Link-the-Wiki track, we also report on the results and findings so far.

  • [PDF] W. Weerkamp, J. He, K. Balog, and E. Meij, “The University of Amsterdam (ILPS) at INEX 2008,” in Inex 2008 workshop pre-proceedings, Dagstuhl, 2008.
    [Bibtex]
    @inproceedings{INEX-WS:2008:weerkamp,
    Abstract = {We describe our participation in the INEX 2008 Entity Ranking and Link-the-Wiki tracks. We provide a detailed account of the ideas underlying our approaches to these tasks. For the Link-the-Wiki track, we also report on the results and findings so far.},
    Address = {Dagstuhl},
    Author = {Weerkamp, W. and He, J. and Balog, K. and Meij, E.},
    Booktitle = {INEX 2008 Workshop Pre-Proceedings},
    Date-Added = {2011-10-16 10:36:58 +0200},
    Date-Modified = {2012-10-28 17:30:53 +0000},
    Title = {{The University of Amsterdam (ILPS) at INEX 2008}},
    Year = {2008}}