CIKM 2016

Document Filtering for Long-tail Entities

Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering…
TREC

TREC 2012 summary

In the 21st Text REtrieval Conference (TREC 2012), seven tracks ran: KBA, Contextual suggestion, Session, Web, Medical, Crowdsourcing, and Microblog. Of these, Microblog attracted the largest number of participating groups (40) closely followed by Medical (24).
TREC KBA logo

Hadoop code for TREC KBA

I’ve decided to put some of the Hadoop code I developed for the TREC KBA task online. It’s available on Github: https://github.com/ejmeij/trec-kba. In particular, it provides classes to read/write topic files, read/write run files, and expose the documents in the Thrift files as Hadoop-readable objects (‘ThriftFileInputFormat’) to be used as input to mappers. I obviously also…