2012 - Edgar Meij

TREC 2012 summary

09/11/2012 Blog No Comments

In the 21st Text REtrieval Conference (TREC 2012), seven tracks ran: KBA, Contextual suggestion, Session, Web, Medical, Crowdsourcing, and Microblog. Of these, Microblog attracted the largest number of participating groups (40) closely followed by Medical (24).

The University of Amsterdam at TREC 2012

09/11/2012 Blog Publications Unrefereed No Comments

This year the Information and Language Processing Systems (ILPS) group of the University of Amsterdam participated in the Microblog and the Knowledge Base Acceleration (KBA) tracks.

Getting ready for Horizon 2020 – Workshop notes

01/10/2012 Blog No Comments

These are my notes from the “Getting ready for Horizon 2020” workshop, presented by Dr. Seán McCarthy from Hyperion Ltd at CWI on Oct 1 2012.

Overview of RepLab 2012: Evaluating Online Reputation Management Systems

20/09/2012 Blog Publications Unrefereed No Comments

This paper summarizes the goals, organization and results of the first RepLab competitive evaluation campaign for Online Reputation Management Systems (RepLab 2012). RepLab focused on the reputation of companies, and asked participant systems to annotate different types of information on tweets containing the names of several companies. Two tasks were proposed: a proling task, where…

Generating Pseudo Test Collections for Learning to Rank Scientific Articles

19/09/2012 Blog Conference Papers Publications No Comments

Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them…

Hadoop code for TREC KBA

24/07/2012 Blog No Comments

I’ve decided to put some of the Hadoop code I developed for the TREC KBA task online. It’s available on Github: https://github.com/ejmeij/trec-kba. In particular, it provides classes to read/write topic files, read/write run files, and expose the documents in the Thrift files as Hadoop-readable objects (‘ThriftFileInputFormat’) to be used as input to mappers. I obviously also…

OpenGeist: Insight in the Stream of Page Views on Wikipedia

03/07/2012 Publications Workshop Papers No Comments

We present a RESTful interface that captures insights into the zeitgeist of Wikipedia users. In recent years many so-called zeitgeist applications have been launched. Such applications are used to gain insights into the current gist of society and actual affairs. Several news sources run zeitgeist applications for popular and trending news.…

Identifying Entity Aspects in Microblog Posts

03/05/2012 Blog Conference Papers Publications No Comments

Online reputation management is about monitoring and handling the public image of entities (such as companies) on the Web. An important task in this area is identifying aspects of the entity of interest (such as products, services, competitors, key people, etc.) given a stream of microblog posts referring to the…

A Corpus for Entity Profiling in Microblog Posts

29/03/2012 Blog Publications Workshop Papers No Comments

Microblogs have become an invaluable source of information for the purpose of online reputation management. An emerging problem in the field of online reputation management consists of identifying the key aspects of an entity commented in microblog posts. Streams of microblogs are of great value because of their direct and…

Zoekmachines van de toekomst

12/02/2012 Blog Publications Publicity No Comments

Er bestaat enige discussie over wat de logische opvolger zal zijn van web 2.0, waarin user-generated content, het delen van informatie en interoperabiliteit centraal stonden. Hoewel meer ideeën de ronde doen, is er veel steun voor het idee web 3.0 gelijk te stellen aan het semantische web. Het sturende idee…