CIKM 2014

Time-Aware Rank Aggregation for Microblog Search

We tackle the problem of searching microblog posts and frame it as a rank aggregation problem where we merge result lists generated by separate rankers so as to produce a final ranking to be returned to the user. We propose a rank aggregation method, TimeRA, that is able to infer the rank scores of documents via latent factor modeling. It is time-aware and rewards posts that are published in or near a burst of posts that are ranked highly in many of the lists being aggregated. Our experimental results show that it significantly outperforms state-of-the-art rank aggregation and time-sensitive microblog search algorithms.

P30 difference plot

Team COMMIT at TREC 2011

We describe the participation of Team COMMIT in this year’s Microblog and Entity track.

In our participation in the Microblog track, we used a feature-based approach. Specifically, we pursued a precision oriented recency-aware retrieval approach for tweets. Amongst others we used various types of external data. In particular, we examined the potential of link retrieval on a corpus of crawled content pages and we use semantic query expansion using Wikipedia. We also deployed pre-filtering based on query-dependent and query-independent features. For the Microblog track we found that a simple cut-off based on the z-score is not sufficient: for differently distributed scores, this can decrease recall. A well set cut-off parameter can however significantly increase precision, especially if there are few highly relevant tweets. Filtering based on query-independent filtering does not help for already small result list. With a high occurrence of links in relevant tweets, we found that using link retrieval helps improving precision and recall for highly relevant and relevant tweets. Future work should focus on a score-distribution dependent selection criterion.

In this years Entity track participation we focused on the Entity List Completion (ELC) task. We experimented with a text based and link based approach to retrieve entities in Linked Data (LD). Additionally we experimented with selecting candidate entities from a web corpus. Our intuition is that entities occurring on pages with many of the example entities are more likely to be good candidates than entities that do not. For the Entity track there are no analyses or conclusions to report yet; at the time of writing no evaluation results are available for the Entity track.

  • [PDF] M. Bron, E. Meij, M. Peetz, M. Tsagkias, and M. de Rijke, “Team COMMIT at TREC 2011,” in The twentieth text retrieval conference, 2012.
    Author = {Bron, Marc and Meij, Edgar and Peetz, Maria-Hendrike and Tsagkias, Manos and de Rijke, Maarten},
    Booktitle = {The Twentieth Text REtrieval Conference},
    Date-Added = {2011-10-22 12:22:19 +0200},
    Date-Modified = {2012-10-30 09:26:12 +0000},
    Series = {TREC 2011},
    Title = {Team {COMMIT} at {TREC 2011}},
    Year = {2012}}