TREC

Incorporating Non-Relevance Information in the Estimation of Query Models

We describe the participation of the University of Amsterdam’s ILPS group in the relevance feedback track at TREC 2008. We introduce a new model which incorporates information from relevant and non-relevant documents to improve the estimation of query models. Our main findings are twofold: (i) in terms of statMAP, a larger number of judged non-relevant documents improves retrieval effectiveness and (ii) on the TREC Ter- abyte topics, we can effectively replace the estimates on the judged non-relevant documents with estimations on the document collection.

  • [PDF] E. Meij, W. Weerkamp, J. He, and M. de Rijke, “Incorporating non-relevance information in the estimation of query models,” in The seventeenth text retrieval conference, 2009.
    [Bibtex]
    @inproceedings{TREC:2009:meij,
    Abstract = {We describe the participation of the University of Amsterdam's ILPS group in the relevance feedback track at TREC 2008. We introduce a new model which incorporates information from relevant and non-relevant documents to improve the estimation of query models. Our main findings are twofold: (i) in terms of statMAP, a larger number of judged non-relevant documents improves retrieval effectiveness and (ii) on the TREC Terabyte topics, we can effectively replace the estimates on the judged non-relevant documents with estimations on the document collection.},
    Author = {Meij, E. and Weerkamp, W. and He, J. and de Rijke, M.},
    Booktitle = {The Seventeenth Text REtrieval Conference},
    Date-Added = {2011-10-16 16:03:56 +0200},
    Date-Modified = {2012-10-30 09:23:32 +0000},
    Series = {TREC 2008},
    Title = {Incorporating Non-Relevance Information in the Estimation of Query Models},
    Year = {2009}}

INEX

The University of Amsterdam (Ilps) at Inex 2008

We describe our participation in the INEX 2008 Entity Ranking and Link-the-Wiki tracks. We provide a detailed account of the ideas underlying our approaches to these tasks. For the Link-the-Wiki track, we also report on the results and findings so far.

  • [PDF] W. Weerkamp, J. He, K. Balog, and E. Meij, “The University of Amsterdam (ILPS) at INEX 2008,” in Inex 2008 workshop pre-proceedings, Dagstuhl, 2008.
    [Bibtex]
    @inproceedings{INEX-WS:2008:weerkamp,
    Abstract = {We describe our participation in the INEX 2008 Entity Ranking and Link-the-Wiki tracks. We provide a detailed account of the ideas underlying our approaches to these tasks. For the Link-the-Wiki track, we also report on the results and findings so far.},
    Address = {Dagstuhl},
    Author = {Weerkamp, W. and He, J. and Balog, K. and Meij, E.},
    Booktitle = {INEX 2008 Workshop Pre-Proceedings},
    Date-Added = {2011-10-16 10:36:58 +0200},
    Date-Modified = {2012-10-28 17:30:53 +0000},
    Title = {{The University of Amsterdam (ILPS) at INEX 2008}},
    Year = {2008}}
CLEF domain-specific sample graphic

The University of Amsterdam at the CLEF 2008 Domain Specific Track – Parsimonious Relevance and Concept Models

We describe our participation in the CLEF 2008 Domain Specific track. The research questions we address are threefold: (i) what are the effects of estimating and applying relevance models to the domain specific collection used at CLEF 2008, (ii) what are the results of parsimonizing these relevance models, and (iii) what are the results of applying concept models for blind relevance feedback? Parsimonization is a technique by which the term probabilities in a language model may be re-estimated based on a comparison with a reference model, making the resulting model more sparse and to the point. Concept models are term distributions over vocabulary terms, based on the language associated with concepts in a thesaurus or ontology and are estimated using the documents which are annotated with concepts. Concept models may be used for blind relevance feedback, by first translating a query to concepts and then back to query terms. We find that applying relevance models helps significantly for the current test collection, in terms of both mean average precision and early precision. Moreover, parsimonizing the relevance models helps mean average precision on title-only queries and early precision on title+narrative queries. Our concept models are able to significantly outperform a baseline query-likelihood run, both in terms of mean average precision and early precision on both title-only and title+narrative queries.

  • [PDF] E. Meij and M. de Rijke, “The University of Amsterdam at the CLEF 2008 Domain Specific Track – parsimonious relevance and concept models,” in Working notes for the clef 2008 workshop, 2008.
    [Bibtex]
    @inproceedings{CLEF-WN:2008:meij,
    Author = {Edgar Meij and Maarten de Rijke},
    Booktitle = {Working Notes for the CLEF 2008 Workshop},
    Date-Added = {2011-10-12 18:31:55 +0200},
    Date-Modified = {2012-10-30 09:28:58 +0000},
    Title = {The {U}niversity of {A}msterdam at the {CLEF} 2008 {Domain Specific Track} - Parsimonious Relevance and Concept Models},
    Year = {2008}}

Text Mining

Bootstrapping Language Associated with Biomedical Entities

The TREC Genomics 2007 task included recognizing topic-specific entities in the returned passages. To address this task, we have designed and implemented a novel data-driven ap- proach by combining information extraction with language modeling techniques. Instead of using an exhaustive list of all possible instances for an entity type, we look at the language usage around each entity type and use that as a classifier to determine whether or not a piece of text discusses such an entity type. We do so by comparing it with language models of the passages. E.g., given the entity type “genes”, our approach can measure the gene-iness of a piece of text.

Our algorithm works as follows. Given an entity type, it first uses Hearst patterns to extract instances of the type. To extract more instances, we look for new contextual patterns around the instances and use them as input for a bootstrapping method, in which new instances and patterns are discovered iteratively. Afterwards, all discovered instances and patterns are used to find the sentences in the collection which are most on par with the requested entity type. A language model is then generated from these sentences and, at retrieval time, we use this model to rerank retrieved passages.

As to the results of our submitted runs, we find that our baseline run performs well above the median of all participant’s scores. Additionally, we find that applying our proposed method helps the entity types for which there are unambiguous patterns and numerous instances most.

  • [PDF] E. Meij and S. Katrenko, “Bootstrapping language associated with biomedical entities,” in The sixteenth text retrieval conference, 2008.
    [Bibtex]
    @inproceedings{TREC:2008:meij,
    Author = {Meij, E. and Katrenko, S.},
    Booktitle = {The Sixteenth Text REtrieval Conference},
    Date-Added = {2011-10-16 10:24:41 +0200},
    Date-Modified = {2012-10-30 09:23:12 +0000},
    Series = {TREC 2007},
    Title = {Bootstrapping Language Associated with Biomedical Entities},
    Year = {2008}}

TREC

Language Models for Enterprise Search: Query Expansion and Combination of Evidence

We describe our participation in the TREC 2006 Enterprise track. We provide a detailed account of the ideas underlying our language modeling approaches to both the discussion search and expert search tasks. For discussion search, our focus was on query expansion techniques, using additional information from the topic statement and from message threads; while the former was generally helpful, the latter mostly hurt performance. In expert search our main experiments concerned query expansion as well as combinations of expert finding and expert profiling techniques.

  • [PDF] K. Balog, E. Meij, and M. de Rijke, “The University of Amsterdam at the TREC 2006 Enterprise Track,” in The fifteenth text retrieval conference, 2007.
    [Bibtex]
    @inproceedings{TREC:2006:balog,
    Author = {Balog, K. and Meij, E. and de Rijke, M.},
    Booktitle = {The Fifteenth Text REtrieval Conference},
    Date-Added = {2011-10-12 23:33:06 +0200},
    Date-Modified = {2012-10-30 09:23:12 +0000},
    Series = {TREC 2006},
    Title = {{The University of Amsterdam at the TREC 2006 Enterprise Track}},
    Year = {2007}}
TREC

Expanding Queries Using Multiple Resources

We describe our participation in the TREC 2006 Genomics track, in which our main focus was on query expansion. We hypothesized that applying query expansion techniques would help us both to identify and retrieve synonymous terms, and to cope with ambiguity. To this end, we developed several collection-specific as well as web-based strategies. We also performed post-submission experiments, in which we compare various retrieval engines, such as Lucene, Indri, and Lemur, using a simple baseline topic set. When indexing entire paragraphs as pseudo-documents, we find that Lemur is able to achieve the highest document-, passage-, and aspect-level scores, using the KL-divergence method and its default settings. Additionally, we index the collection at a lower level of granularity, by creating pseudo-documents comprising of individual sentences. When we search these instead of paragraphs in Lucene, the passage-level scores improve considerably. Finally we note that stemming improves overall scores by at least 10%.

  • [PDF] E. Meij, M. Jansen, and M. de Rijke, “Expanding queries using multiple resources (the AID group at TREC 2006: genomics track),” in The fifteenth text retrieval conference, 2007.
    [Bibtex]
    @inproceedings{TREC:2006:meij,
    Author = {Meij, E. and Jansen, M. and de Rijke, M.},
    Booktitle = {The Fifteenth Text REtrieval Conference},
    Date-Added = {2011-10-12 23:24:14 +0200},
    Date-Modified = {2012-10-30 09:23:12 +0000},
    Series = {TREC 2006},
    Title = {Expanding Queries Using Multiple Resources (The {AID} Group at {TREC} 2006: Genomics Track)},
    Year = {2007}}

Combining Thesauri-based Methods for Biomedical Retrieval

This paper describes our participation in the TREC 2005 Genomics track. We took part in the ad hoc retrieval task and aimed at integrating thesauri in the retrieval model. We developed three thesauri-based methods, two of which made use of the existing MeSH thesaurus. One method uses blind relevance feedback on MeSH terms, the second uses an index of the MeSH thesaurus for query expansion. The third method makes use of a dynamically generated lookup list, by which acronyms and synonyms could be inferred. We show that, despite the relatively minor improvements in retrieval performance of individually applied methods, a combination works best and is able to deliver significant improvements over the baseline.

  • [PDF] E. Meij, L. H. L. IJzereef, L. A. Azzopardi, J. Kamps, M. de Rijke, M. Voorhees, and L. P. Buckland, “Combining thesauri-based methods for biomedical retrieval,” in The fourteenth text retrieval conference, 2006.
    [Bibtex]
    @inproceedings{TREC:2005:meij,
    Author = {Meij, E. and IJzereef, L.H.L. and Azzopardi, L.A. and Kamps, J. and de Rijke, M. and Voorhees, M. and Buckland, L.P.},
    Booktitle = {The Fourteenth Text REtrieval Conference},
    Date-Added = {2011-10-12 23:16:44 +0200},
    Date-Modified = {2012-10-30 09:23:12 +0000},
    Series = {TREC 2005},
    Title = {Combining Thesauri-based Methods for Biomedical Retrieval},
    Year = {2006}}