TREC 2012 summary

09/11/2012 Blog No Comments

In the 21st Text REtrieval Conference (TREC 2012), seven tracks ran: KBA, Contextual suggestion, Session, Web, Medical, Crowdsourcing, and Microblog. Of these, Microblog attracted the largest number of participating groups (40) closely followed by Medical (24). UvA mainly participated in KBA (Knowledge Base Acceleration) and was one of 11 participating groups. The KBA task is a typical cumulative citation recommendation task, where a stream of documents is filtered for relevance. In this case, relevance is determined using entities, i.e., the use-case is a Wikipedia editor with an interest in a certain Wikipedia article (entity) and she needs to be notified of “interesting” documents, that are “central” to the entity.

In our participation we evaluated a previously proposed approach on the KBA test collection and extended it to accommodate the temporal, evolving nature of the document collection. Our official runs contained a bug, but the repaired version obtains encouraging performance (and can be found online). There was some degree of variability between the KBA approaches, although most of them used entity “representations” in some way or other. CWI, for instance, used the Google anchor-concept dump, UDel used outlinks inside the Wikipedia article, and UIUC (Miles Efron) used the article’s edit history. UMass (Jeff Dalton) included the same entity name variants as us, including titles, redirects, and anchors. UMass also explicitly addressed the connection between TREC-KBA and TAC-KBP, hopefully resulting in the two tracks moving together (KBx? KBY?). In any case, I’m looking forward to next year’s KBA, where most of these approaches will (hopefully) be combined to further improve performance.

As to TREC 2013, there will be quite some changes. TREC Medical stops, mainly due to issues with the medical records document collection. There will be two new tracks: Temporal summarization (Fernando Diaz) and Federated web search (Djoerd Hiemstra). Especially the first one seems interesting from a KBA point of view. Furthermore, TREC-TempSum will use the same document collection as TREC-KBA in 2013 (hopefully also including tweets and Facebook updates), fostering further possible integration between the tracks. While still unclear, TREC-Microblog (called TREC-RealTime next year) is contemplating using this collection as well. Speaking of new collections, TREC-Web 2013 also features a new collection, ClueWeb12 (as well as a new task: risk-sensitive retrieval). And it seems ClueWeb12 might also be used by the contextual suggestion and crowdsourcing tracks in 2013.

All in all, it was an exciting edition of TREC with lots of interesting discussions. Food for thought, not only for next year’s TREC but also for the upcoming SIGIR deadline :).

Edgar Meij

TREC 2012 summary

The University of Amsterdam at TREC 2012

Real-Time Rank Aggregation for Microblog Search

Leave a Reply Cancel reply