TREC

Language Models for Enterprise Search: Query Expansion and Combination of Evidence

We describe our participation in the TREC 2006 Enterprise track. We provide a detailed account of the ideas underlying our language modeling approaches to both the discussion search and expert search tasks. For discussion search, our focus was on query expansion techniques, using additional information from the topic statement and from message threads; while the former was generally helpful, the latter mostly hurt performance. In expert search our main experiments concerned query expansion as well as combinations of expert finding and expert profiling techniques.

  • [PDF] K. Balog, E. Meij, and M. de Rijke, “The University of Amsterdam at the TREC 2006 Enterprise Track,” in The fifteenth text retrieval conference, 2007.
    [Bibtex]
    @inproceedings{TREC:2006:balog,
    Author = {Balog, K. and Meij, E. and de Rijke, M.},
    Booktitle = {The Fifteenth Text REtrieval Conference},
    Date-Added = {2011-10-12 23:33:06 +0200},
    Date-Modified = {2012-10-30 09:23:12 +0000},
    Series = {TREC 2006},
    Title = {{The University of Amsterdam at the TREC 2006 Enterprise Track}},
    Year = {2007}}
TREC

Expanding Queries Using Multiple Resources

We describe our participation in the TREC 2006 Genomics track, in which our main focus was on query expansion. We hypothesized that applying query expansion techniques would help us both to identify and retrieve synonymous terms, and to cope with ambiguity. To this end, we developed several collection-specific as well as web-based strategies. We also performed post-submission experiments, in which we compare various retrieval engines, such as Lucene, Indri, and Lemur, using a simple baseline topic set. When indexing entire paragraphs as pseudo-documents, we find that Lemur is able to achieve the highest document-, passage-, and aspect-level scores, using the KL-divergence method and its default settings. Additionally, we index the collection at a lower level of granularity, by creating pseudo-documents comprising of individual sentences. When we search these instead of paragraphs in Lucene, the passage-level scores improve considerably. Finally we note that stemming improves overall scores by at least 10%.

  • [PDF] E. Meij, M. Jansen, and M. de Rijke, “Expanding queries using multiple resources (the AID group at TREC 2006: genomics track),” in The fifteenth text retrieval conference, 2007.
    [Bibtex]
    @inproceedings{TREC:2006:meij,
    Author = {Meij, E. and Jansen, M. and de Rijke, M.},
    Booktitle = {The Fifteenth Text REtrieval Conference},
    Date-Added = {2011-10-12 23:24:14 +0200},
    Date-Modified = {2012-10-30 09:23:12 +0000},
    Series = {TREC 2006},
    Title = {Expanding Queries Using Multiple Resources (The {AID} Group at {TREC} 2006: Genomics Track)},
    Year = {2007}}