TREC KBA Archives - Edgar Meij

Document Filtering for Long-tail Entities

Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering…

Time-Aware Chi-squared for Document Filtering over Time

22/07/2013 Publications Workshop Papers No Comments

To appear at TAIA2013 (a SIGIR 2013 workshop). Document filtering over time is widely applied in various tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated…

TREC 2012 summary

09/11/2012 Blog No Comments

In the 21st Text REtrieval Conference (TREC 2012), seven tracks ran: KBA, Contextual suggestion, Session, Web, Medical, Crowdsourcing, and Microblog. Of these, Microblog attracted the largest number of participating groups (40) closely followed by Medical (24).

The University of Amsterdam at TREC 2012

09/11/2012 Blog Publications Unrefereed No Comments

This year the Information and Language Processing Systems (ILPS) group of the University of Amsterdam participated in the Microblog and the Knowledge Base Acceleration (KBA) tracks.

Hadoop code for TREC KBA

24/07/2012 Blog No Comments

I’ve decided to put some of the Hadoop code I developed for the TREC KBA task online. It’s available on Github: https://github.com/ejmeij/trec-kba. In particular, it provides classes to read/write topic files, read/write run files, and expose the documents in the Thrift files as Hadoop-readable objects (‘ThriftFileInputFormat’) to be used as input to mappers. I obviously also…