Plot of a query-specific burst

Adaptive Temporal Query Modeling

We present an approach to query modeling that uses the temporal distribution of documents in an initially retrieved set of documents. Such distributions tend to exhibit bursts, especially in news related document collections. We hypothesize that documents in those bursts are more likely to be relevant than others. Predicated on this, we expand queries with the most distinguishing terms in high quality documents sampled from bursts. We show how the most commonly used decay function for recent document retrieval can be used as probabilistic model for temporal retrieval in general. The effectiveness of our models is demonstrated on both news collections and a collection of blog posts.

  • [PDF] M. Peetz, E. Meij, M. de Rijke, and W. Weerkamp, “Adaptive temporal query modeling,” in Advances in information retrieval – 34th european conference on ir research, ecir 2012, 2012.
    [Bibtex]
    @inproceedings{ECIR:2012:peetz,
    Author = {Peetz, Maria-Hendrike and Meij, Edgar and de Rijke, Maarten and Weerkamp, Wouter},
    Booktitle = {Advances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012},
    Date-Added = {2011-11-23 18:10:40 +0100},
    Date-Modified = {2012-10-28 23:01:12 +0000},
    Title = {Adaptive Temporal Query Modeling},
    Year = {2012}}
social media icons

A Framework for Unsupervised Spam Detection in Social Networking Sites

Social networking sites offer users the option to submit user spam reports for a given message, indicating this message is inappropriate. In this paper we present a framework that uses these user spam reports for spam detection. The framework is based on the HITS web link analysis framework and is instantiated in three models. The models subsequently introduce propagation between messages reported by the same user, messages authored by the same user, and messages with similar content. Each of the models can also be converted to a simple semi-supervised scheme. We test our models on data from a popular social network and compare the models to two baselines, based on message content and raw report counts. We find that our models outperform both baselines and that each of the additions (reporters, authors, and similar messages) further improves the performance of the framework.

  • [PDF] M. Bosma, E. Meij, and W. Weerkamp, “A framework for unsupervised spam detection in social networking sites,” in Advances in information retrieval – 34th european conference on ir research, ecir 2012, 2012.
    [Bibtex]
    @inproceedings{ECIR:2012:bosma,
    Author = {Maarten Bosma and Meij, Edgar and Weerkamp, Wouter},
    Booktitle = {Advances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012},
    Date-Added = {2011-11-23 18:10:33 +0100},
    Date-Modified = {2012-10-28 23:00:37 +0000},
    Title = {A Framework for Unsupervised Spam Detection in Social Networking Sites},
    Year = {2012}}