We describe the par­tic­i­pa­tion of Team COMMIT in this year’s Microblog and Entity track.

In our par­tic­i­pa­tion in the Microblog track, we used a feature-based approach. Specif­i­cally, we pur­sued a pre­ci­sion ori­ented recency-aware retrieval approach for tweets. Amongst oth­ers we used var­i­ous types of exter­nal data. In par­tic­u­lar, we exam­ined the poten­tial of link retrieval on a cor­pus of crawled con­tent pages and we use seman­tic query expan­sion using Wikipedia. We also deployed pre-filtering based on query-dependent and query-independent fea­tures. For the Microblog track we found that a sim­ple cut-off based on the z-score is not suf­fi­cient: for dif­fer­ently dis­trib­uted scores, this can decrease recall. A well set cut-off para­me­ter can how­ever sig­nif­i­cantly increase pre­ci­sion, espe­cially if there are few highly rel­e­vant tweets. Fil­ter­ing based on query-independent fil­ter­ing does not help for already small result list. With a high occur­rence of links in rel­e­vant tweets, we found that using link retrieval helps improv­ing pre­ci­sion and recall for highly rel­e­vant and rel­e­vant tweets. Future work should focus on a score-distribution depen­dent selec­tion criterion.

In this years Entity track par­tic­i­pa­tion we focused on the Entity List Com­ple­tion (ELC) task. We exper­i­mented with a text based and link based approach to retrieve enti­ties in Linked Data (LD). Addi­tion­ally we exper­i­mented with select­ing can­di­date enti­ties from a web cor­pus. Our intu­ition is that enti­ties occur­ring on pages with many of the exam­ple enti­ties are more likely to be good can­di­dates than enti­ties that do not. For the Entity track there are no analy­ses or con­clu­sions to report yet; at the time of writ­ing no eval­u­a­tion results are avail­able for the Entity track.

  • [PDF] M. Bron, E. Meij, M. Peetz, M. Tsagkias, and M. de Rijke, “Team COMMIT at TREC 2011,” in Pro­ceed­ings of The Twen­ti­eth Text REtrieval Con­fer­ence, TREC 2011, 2011.
    [Bib­tex]
    @inproceedings{TREC:2011:commit,
      Author = {Bron, Marc and Meij, Edgar and Peetz, Maria-Hendrike and Tsagkias, Manos and de Rijke, Maarten},
      Booktitle = {Proceedings of The Twentieth Text REtrieval Conference, TREC 2011},
      Date-Added = {2011-10-22 12:22:19 +0200},
      Date-Modified = {2012-02-12 14:02:18 +0100},
      Editor = {Ellen M. Voorhees and Lori Buckland},
      Publisher = {National Institute of Standards and Technology ({NIST})},
      Title = {Team {COMMIT} at {TREC 2011}},
      Year = {2011}}