Overview of RepLab 2012: Evaluating Online Reputation Management Systems

This paper summarizes the goals, organization and results of the first RepLab competitive evaluation campaign for Online Reputation Management Systems (RepLab 2012). RepLab focused on the reputation of companies, and asked participant systems to annotate different types of information on tweets containing the names of several companies. Two tasks were proposed: a pro ling task, where tweets had to be annotated for relevance and polarity for reputation, and a monitoring task, where tweets had to be clustered thematically and clusters had to be ordered by priority (for reputation management purposes). The gold standard consisted of annotations made by reputation management experts, a feature which turns the RepLab 2012 test collection in a useful source not only to evaluate systems, but also to reach a better understanding of the notions of polarity and priority in the context of reputation management.

  • [PDF] E. Amigó, A. Corujo, J. Gonzalo, E. Meij, and M. de Rijke, “Overview of RepLab 2012: evaluating online reputation management systems,” in Clef (online working notes/labs/workshop), 2012.
    [Bibtex]
    @inproceedings{CLEF:2012:replab,
    Author = {Enrique Amig{\'o} and Adolfo Corujo and Julio Gonzalo and Edgar Meij and Maarten de Rijke},
    Booktitle = {CLEF (Online Working Notes/Labs/Workshop)},
    Date-Added = {2012-09-20 12:48:33 +0000},
    Date-Modified = {2012-10-30 09:30:49 +0000},
    Title = {Overview of {RepLab} 2012: Evaluating Online Reputation Management Systems},
    Year = {2012}}
TREC

The University of Amsterdam at the TREC 2011 Session Track

We describe the participation of the University of Amsterdam’s ILPS group in the Session track at TREC 2011.

The stream of interactions created by a user engaging with a search system contains a wealth of information. For retrieval purposes, previous interactions can help inform us about a user’s current information need. Building on this intuition, our contribution to this TREC year’s session track focuses on session modeling and learning to rank using session information. In this paper, we present and compare three complementary strategies that we designed for improving retrieval for a current query using previous queries and clicked results: probabilistic session modeling, semantic query modeling, and implicit feedback.

In our experiments we examined three complementary strategies for improving retrieval for a current query. Our first strategy, based on probabilistic session modeling, was the best performing strategy.

Our second strategy, based on semantic query modeling, did less well than we expected, likely due to topic drift from excessively aggressive query expansion. We expect that performance of this strategy would improve by limiting the number of terms and/or improving the probability estimates.

With respect to our third strategy, based on learning from feedback, we found that learning weights for linear weighted combinations of features from an external collection can be beneficial, if characteristics of the collection are similar to the current data. Feedback available in the form of user clicks appeared to be less beneficial. Our run learning from implicit feedback did perform substantially lower than a run where weights were learned from an external collection with explicit feedback using the same learning algorithm and set of features.

  • [PDF] B. Huurnink, R. Berendsen, K. Hofmann, E. Meij, and M. de Rijke, “The University of Amsterdam at the TREC 2011 session track,” in The twentieth text retrieval conference, 2012.
    [Bibtex]
    @inproceedings{TREC:2011:huurnink,
    Author = {Huurnink, Bouke and Berendsen, Richard and Hofmann, Katja and Meij, Edgar and de Rijke, Maarten},
    Booktitle = {The Twentieth Text REtrieval Conference},
    Date-Added = {2011-10-22 12:22:18 +0200},
    Date-Modified = {2013-05-22 11:44:53 +0000},
    Month = {January},
    Series = {TREC 2011},
    Title = {The {University of Amsterdam} at the {TREC} 2011 Session Track},
    Year = {2012}}
P30 difference plot

Team COMMIT at TREC 2011

We describe the participation of Team COMMIT in this year’s Microblog and Entity track.

In our participation in the Microblog track, we used a feature-based approach. Specifically, we pursued a precision oriented recency-aware retrieval approach for tweets. Amongst others we used various types of external data. In particular, we examined the potential of link retrieval on a corpus of crawled content pages and we use semantic query expansion using Wikipedia. We also deployed pre-filtering based on query-dependent and query-independent features. For the Microblog track we found that a simple cut-off based on the z-score is not sufficient: for differently distributed scores, this can decrease recall. A well set cut-off parameter can however significantly increase precision, especially if there are few highly relevant tweets. Filtering based on query-independent filtering does not help for already small result list. With a high occurrence of links in relevant tweets, we found that using link retrieval helps improving precision and recall for highly relevant and relevant tweets. Future work should focus on a score-distribution dependent selection criterion.

In this years Entity track participation we focused on the Entity List Completion (ELC) task. We experimented with a text based and link based approach to retrieve entities in Linked Data (LD). Additionally we experimented with selecting candidate entities from a web corpus. Our intuition is that entities occurring on pages with many of the example entities are more likely to be good candidates than entities that do not. For the Entity track there are no analyses or conclusions to report yet; at the time of writing no evaluation results are available for the Entity track.

  • [PDF] M. Bron, E. Meij, M. Peetz, M. Tsagkias, and M. de Rijke, “Team COMMIT at TREC 2011,” in The twentieth text retrieval conference, 2012.
    [Bibtex]
    @inproceedings{TREC:2011:commit,
    Author = {Bron, Marc and Meij, Edgar and Peetz, Maria-Hendrike and Tsagkias, Manos and de Rijke, Maarten},
    Booktitle = {The Twentieth Text REtrieval Conference},
    Date-Added = {2011-10-22 12:22:19 +0200},
    Date-Modified = {2012-10-30 09:26:12 +0000},
    Series = {TREC 2011},
    Title = {Team {COMMIT} at {TREC 2011}},
    Year = {2012}}
Dutch Belgian Information Retrieval Workshop logo

Dir 2011: the eleventh Dutch-Belgian information retrieval workshop

The 11th edition of the annual Dutch-Belgian Information Retrieval workshop (DIR 2011) took place on February 4 in Amsterdam. It was organized by the University of Amsterdam and the Centrum Wiskunde & Informatica. The focus of this year’s workshop was on interaction, with the goal of facilitating and increasing interaction, especially within the local research community, and between industry and academia. The scientific program included demos, research papers, and compressed contributions. The keynotes by Nick Belkin and Gabriella Kazai provided intriguing outlooks on the future of IR evaluation.

  • [PDF] C. Boscarino, K. Hofmann, V. B. Jijkoun, E. Meij, M. de Rijke, and W. Weerkamp, “Workshop report: dutch-belgian information retrieval,” Sigir forum, vol. 45, iss. 1, pp. 42-44, 2011.
    [Bibtex]
    @article{forum:2011:dir,
    Author = {Boscarino, C. and Hofmann, K. and Jijkoun, V.B. and Meij, E. and de Rijke, M. and Weerkamp, W.},
    Chapter = {42},
    Date-Added = {2011-10-20 10:48:47 +0200},
    Date-Modified = {2011-10-20 10:48:52 +0200},
    Journal = {SIGIR Forum},
    Number = {1},
    Pages = {42-44},
    Title = {Workshop report: Dutch-Belgian Information Retrieval},
    Volume = {45},
    Year = {2011}}

TREC

The University of Amsterdam at Trec 2010: Session, Entity, and Relevance Feedback

We describe the participation of the University of Amsterdam’s ILPS group in the session, entity, and relevance feedback track at TREC 2010. In the Session Track we explore the use of blind relevance feedback to bias a follow-up query towards or against the topics covered in documents returned to the user in response to the original query. In the Entity Track REF task we experiment with a window size parameter to limit the amount of context considered by the entity co-occurrence models and explore the use of Freebase for type filtering, entity normalization and homepage finding. In the ELC task we use an approach that uses the number of links shared between candidate and example entities to rank candidates. In the Relevance Feedback Track we experiment with a novel model that uses Wikipedia to expand the query language model.

  • [PDF] M. Bron, J. He, K. Hofmann, E. Meij, M. de Rijke, E. Tsagkias, and W. Weerkamp, “The University of Amsterdam at TREC 2010: session, entity, and relevance feedback,” in The nineteenth text retrieval conference, 2011.
    [Bibtex]
    @inproceedings{TREC:2011:bron,
    Abstract = {We describe the participation of the University of Amsterdam's Intelligent Systems Lab in the web track at TREC 2009. We participated in the adhoc and diversity task. We find that spam is an important issue in the ad hoc task and that Wikipedia-based heuristic optimization approaches help to boost the retrieval performance, which is assumed to potentially reduce spam in the top ranked results. As for the diversity task, we explored different methods. Clustering and a topic model-based approach have a similar performance and both are relatively better than a query log based approach.},
    Author = {M. Bron and He, J. and Hofmann, K. and Meij, E. and de Rijke, M. and Tsagkias, E. and Weerkamp, W.},
    Booktitle = {The Nineteenth Text REtrieval Conference},
    Date-Added = {2011-10-20 11:18:35 +0200},
    Date-Modified = {2012-10-30 09:25:06 +0000},
    Series = {TREC 2010},
    Title = {{The University of Amsterdam at TREC 2010}: Session, Entity, and Relevance Feedback},
    Year = {2011}}
TREC

Heuristic Ranking and Diversification of Web Documents

We describe the participation of the University of Amsterdam’s Intelligent Systems Lab in the web track at TREC 2009. We participated in the adhoc and diversity task. We find that spam is an important issue in the ad hoc task and that Wikipedia-based heuristic optimization approaches help to boost the retrieval performance, which is assumed to potentially reduce spam in the top ranked results. As for the diversity task, we explored different methods. Clustering and a topic model-based approach have a similar performance and both are relatively better than a query log based approach.,

  • [PDF] J. He, K. Balog, K. Hofmann, E. Meij, M. de Rijke, E. Tsagkias, and W. Weerkamp, “Heuristic ranking and diversification of web documents,” in The eighteenth text retrieval conference, 2010.
    [Bibtex]
    @inproceedings{TREC:2010:he,
    Abstract = {We describe the participation of the University of Amsterdam's Intelligent Systems Lab in the web track at TREC 2009. We participated in the adhoc and diversity task. We find that spam is an important issue in the ad hoc task and that Wikipedia-based heuristic optimization approaches help to boost the retrieval performance, which is assumed to potentially reduce spam in the top ranked results. As for the diversity task, we explored different methods. Clustering and a topic model-based approach have a similar performance and both are relatively better than a query log based approach.},
    Author = {He, J. and Balog, K. and Hofmann, K. and Meij, E. and de Rijke, M. and Tsagkias, E. and Weerkamp, W.},
    Booktitle = {The Eighteenth Text REtrieval Conference},
    Date-Added = {2011-10-20 09:45:15 +0200},
    Date-Modified = {2012-10-30 09:24:20 +0000},
    Series = {TREC 2009},
    Title = {Heuristic Ranking and Diversification of Web Documents},
    Year = {2010}}
i found you!

A Semantic Perspective on Query Log Analysis

We present our views on the CLEF log file analysis task. We argue for a task definition that focuses on the semantic enrichment of query logs. In addition, we discuss how additional information about the context in which queries are being made could further our understanding of users’ information seeking and how to better facilitate this process.

  • [PDF] K. Hofmann, M. de Rijke, B. Huurnink, and E. Meij, “A semantic perspective on query log analysis,” in Working notes for the clef 2009 workshop, 2009.
    [Bibtex]
    @inproceedings{CLEF:2009:hofmann,
    Abstract = {We present our views on the CLEF log file analysis task. We argue for a task definition that focuses on the semantic enrichment of query logs. In addition, we discuss how additional information about the context in which queries are being made could further our understanding of users' information seeking and how to better facilitate this process. },
    Author = {Hofmann, K. and de Rijke, M. and Huurnink, B. and Meij, E.},
    Booktitle = {Working Notes for the CLEF 2009 Workshop},
    Date-Added = {2011-10-17 09:46:16 +0200},
    Date-Modified = {2011-10-17 09:46:16 +0200},
    Title = {A Semantic Perspective on Query Log Analysis},
    Year = {2009}}