Plot of a query-specific burst

Adaptive Temporal Query Modeling

We present an approach to query modeling that uses the temporal distribution of documents in an initially retrieved set of documents. Such distributions tend to exhibit bursts, especially in news related document collections. We hypothesize that documents in those bursts are more likely to be relevant than others. Predicated on this, we expand queries with the most distinguishing terms in high quality documents sampled from bursts. We show how the most commonly used decay function for recent document retrieval can be used as probabilistic model for temporal retrieval in general. The effectiveness of our models is demonstrated on both news collections and a collection of blog posts.

  • [PDF] M. Peetz, E. Meij, M. de Rijke, and W. Weerkamp, “Adaptive temporal query modeling,” in Advances in information retrieval – 34th european conference on ir research, ecir 2012, 2012.
    Author = {Peetz, Maria-Hendrike and Meij, Edgar and de Rijke, Maarten and Weerkamp, Wouter},
    Booktitle = {Advances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012},
    Date-Added = {2011-11-23 18:10:40 +0100},
    Date-Modified = {2012-10-28 23:01:12 +0000},
    Title = {Adaptive Temporal Query Modeling},
    Year = {2012}}
thesis cover image of a smart computer

Combining Concepts and Language Models for Information Access

Since the middle of last century, information retrieval has gained an increasing interest. Since its inception, much research has been devoted to finding optimal ways of representing both documents and queries, as well as improving ways of matching one with the other. In cases where document annotations or explicit semantics are available, matching algorithms can be informed using the concept languages in which such semantics are usually defined. These algorithms are able to match queries and documents based on textual and semantic evidence.

Recent advances have enabled the use of rich query representations in the form of query language models. This, in turn, allows us to account for the language associated with concepts within the retrieval model in a principled and transparent manner. Developments in the semantic web community, such as the Linked Open Data cloud, have enabled the association of texts with concepts on a large scale. Taken together, these developments facilitate a move beyond manually assigned concepts in domain-specific contexts into the general domain.

This thesis investigates how one can improve information access by employing the actual use of concepts as measured by the language that people use when they discuss them. The main contribution is a set of models and methods that enable users to retrieve and access information on a conceptual level. Through extensive evaluations, a systematic exploration and thorough analysis of the experimental results of the proposed models is performed. Our empirical results show that a combination of top-down conceptual information and bottom-up statistical information obtains optimal performance on a variety of tasks and test collections.

See for more information.

  • [PDF] E. Meij, “Combining concepts and language models for information access,” PhD Thesis, 2010.
    Author = {Meij, Edgar},
    Date-Added = {2011-10-20 10:18:00 +0200},
    Date-Modified = {2011-10-22 12:23:33 +0200},
    School = {University of Amsterdam},
    Title = {Combining Concepts and Language Models for Information Access},
    Year = {2010}}


Using Prior Information Derived from Citations in Literature Search

Researchers spend a large amount of their time searching through an ever increasing number of scientific articles. Although users of scientific literature search engines prefer the ranking of results according to the number of citations a publication has received, it is unknown whether this notion of authoritativeness could also benefit more traditional and objective measures. Is it also an indicator of relevance, given an information need? In this paper, we examine the relationship between citation features of a scientific article and its prior probability of actually being relevant to an information need. We propose various ways of modeling this relationship and show how this kind of contextual information can be incorporated within a language modeling framework. We experiment with three document priors, which we evaluate on three distinct sets of queries and two document collections from the TREC Genomics track. Empirical results show that two of the proposed priors can significantly improve retrieval effectiveness, measured in terms of mean average precision.

  • [PDF] E. Meij and M. de Rijke, “Using prior information derived from citations in literature search,” in Riao 2007, 2007.
    Author = {Meij, E. and de Rijke, M.},
    Booktitle = {RIAO 2007},
    Date-Added = {2011-10-13 09:05:34 +0200},
    Date-Modified = {2012-10-30 08:49:59 +0000},
    Title = {Using Prior Information Derived from Citations in Literature Search},
    Year = {2007}}