Plot of a query-specific burst

Adaptive Temporal Query Modeling

We present an approach to query modeling that uses the temporal distribution of documents in an initially retrieved set of documents. Such distributions tend to exhibit bursts, especially in news related document collections. We hypothesize that documents in those bursts are more likely to be relevant than others. Predicated on this, we expand queries with the most distinguishing terms in high quality documents sampled from bursts. We show how the most commonly used decay function for recent document retrieval can be used as probabilistic model for temporal retrieval in general. The effectiveness of our models is demonstrated on both news collections and a collection of blog posts.

 • [PDF] M. Peetz, E. Meij, M. de Rijke, and W. Weerkamp, “Adaptive temporal query modeling,” in Advances in information retrieval – 34th european conference on ir research, ecir 2012, 2012.
  [Bibtex]
  @inproceedings{ECIR:2012:peetz,
  Author = {Peetz, Maria-Hendrike and Meij, Edgar and de Rijke, Maarten and Weerkamp, Wouter},
  Booktitle = {Advances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012},
  Date-Added = {2011-11-23 18:10:40 +0100},
  Date-Modified = {2012-10-28 23:01:12 +0000},
  Title = {Adaptive Temporal Query Modeling},
  Year = {2012}}
social media icons

A Framework for Unsupervised Spam Detection in Social Networking Sites

Social networking sites offer users the option to submit user spam reports for a given message, indicating this message is inappropriate. In this paper we present a framework that uses these user spam reports for spam detection. The framework is based on the HITS web link analysis framework and is instantiated in three models. The models subsequently introduce propagation between messages reported by the same user, messages authored by the same user, and messages with similar content. Each of the models can also be converted to a simple semi-supervised scheme. We test our models on data from a popular social network and compare the models to two baselines, based on message content and raw report counts. We find that our models outperform both baselines and that each of the additions (reporters, authors, and similar messages) further improves the performance of the framework.

 • [PDF] M. Bosma, E. Meij, and W. Weerkamp, “A framework for unsupervised spam detection in social networking sites,” in Advances in information retrieval – 34th european conference on ir research, ecir 2012, 2012.
  [Bibtex]
  @inproceedings{ECIR:2012:bosma,
  Author = {Maarten Bosma and Meij, Edgar and Weerkamp, Wouter},
  Booktitle = {Advances in Information Retrieval - 34th European Conference on IR Research, ECIR 2012},
  Date-Added = {2011-11-23 18:10:33 +0100},
  Date-Modified = {2012-10-28 23:00:37 +0000},
  Title = {A Framework for Unsupervised Spam Detection in Social Networking Sites},
  Year = {2012}}
onszelf voorbij

Wij-woorden op websites: Zoekmachines voor geesteswetenschappers

Volgens velen in onze samenleving zijn we onszelf in het proces van integratie en multi-culti finaal voorbijgelopen. Sinds tien jaar is de toon van het debat in de media en op internet volslagen veranderd. De regering verkondigt dat de multiculturele samenleving is mislukt en dus wordt afgeschaft. Etnische achterstandsgroepen moeten zichzelf maar zien te redden en populistische uitlatingen doen het goed – soms met extreme gevolgen. Wie anders meent is soft en hoort vast bij de ‘linkse kerk’.

Een team van wetenschappers onderzoekt hoe theatergezelschappen, politieke partijen, kerken en andere groepen  hun grenzen trekken én overschrijden. Hoe stellen zij zich de wereld voorbij hun eigen grenzen voor? Als het domein van een bedreigende ander? Of ligt daar juist een braakliggend terrein vol mogelijkheden voor eigen nog niet verwerkelijkte projecten? En welke politieke consequenties hebben die verschillende voorstellingen? Daarbij speelt steeds de vraag: levert het winst of verlies op om over de grenzen van de eigen identiteit heen te kijken? Middeleeuwse kaartenmakers kenden niet de hele wereld en de onbekende delen lieten ze vaak maar wit. ‘Hier zijn draken’ of ‘waar de leeuwen zijn’, schreven ze er dan bij, maar soms ook ‘Eldorado’ of zelfs ‘het Paradijs’. Onszelf voorbij gaat over veranderende gemeenschappen vandaag de dag. En vooral over de vraag of het verlies of winst oplevert om over de grenzen van de eigen identiteit heen te kijken. Volgens velen in de samenleving zijn we onszelf in het proces van integratie en multi-culti geheel voorbijgelopen. Anderen vinden dat we zo’n stap ‘voorbij ons eigen erf’ eerst maar eens moeten zetten.

Onszelf voorbij gaat over veranderende gemeenschappen vandaag de dag. Waar hoor ik bij? En wie hoort bij mij? Dat was vroeger een duidelijke zaak: kerk, vakbond, partij en familie trokken de grenzen. Nu zijn deze vormen van verbondenheid in een stroomversnelling geraakt. Zijn we bang geworden voor wat er voorbij de grenzen van onze eigen groep ligt? Of durven we over die grens heen te stappen, onszelf voorbij?

Klik hier voor meer informatie.

Screenshot of the analysis webtool

Women’s views on consent, counseling and confidentiality in Pmtct: a mixed-methods study in four African countries

Accepted subject to revisions.

Ambitious UN goals to reduce the mother-to-child transmission of HIV have not been met in much of Sub-Saharan Africa. This paper focuses on the quality of information provision and counseling and disclosure patterns in Burkina Faso, Kenya, Malawi and Uganda to identify how services can be improved to enable better PMTCT outcomes.

hits per time of day

People searching for people: analysis of a people search engine log

Recent years show an increasing interest in vertical search: searching within a particular type of information. Understanding what people search for in these “verticals” gives direction to research and provides pointers for the search engines themselves. In this paper we analyze the search logs of one particular vertical: people search engines. Based on an extensive analysis of the logs of a search engine geared towards finding people, we propose a classification scheme for people search at three levels: (a) queries, (b) sessions, and (c) users. For queries, we identify three types, (i) event-based high-profile queries (people that become “popular” because of an event happening), (ii) regular high-profile queries (celebrities), and (iii) low-profile queries (other, less-known people). We present experiments on automatic classification of queries. On the session level, we observe five types: (i) family sessions (users looking for relatives), (ii) event sessions (querying the main players of an event), (iii) spotting sessions (trying to “spot” different celebrities online), (iv) polymerous sessions (sessions without a clear relation between queries), and (v) repetitive sessions (query refinement and copying). Finally, for users we identify four types: (i) monitors, (ii) spotters, (iii) followers, and (iv) polymers.

Our findings not only offer insight into search behavior in people search engines, but they are also useful to identify future research directions and to provide pointers for search engine improvements.

 • [PDF] W. Weerkamp, R. Berendsen, B. Kovachev, E. Meij, K. Balog, and M. de Rijke, “People searching for people: analysis of a people search engine log,” in Proceedings of the 34th international acm sigir conference on research and development in information, 2011.
  [Bibtex]
  @inproceedings{sigir:2011:weerkamp,
  Author = {Weerkamp, Wouter and Berendsen, Richard and Kovachev, Bogomil and Meij, Edgar and Balog, Krisztian and de Rijke, Maarten},
  Booktitle = {Proceedings of the 34th international ACM SIGIR conference on Research and development in Information},
  Date-Added = {2011-10-20 10:50:25 +0200},
  Date-Modified = {2012-10-30 08:41:27 +0000},
  Series = {SIGIR 2011},
  Title = {People searching for people: analysis of a people search engine log},
  Year = {2011},
  Bdsk-Url-1 = {http://doi.acm.org/10.1145/2009916.2009927}}

ACM DL Author-ize servicePeople searching for people: analysis of a people search engine log

Wouter Weerkamp, Richard Berendsen, Bogomil Kovachev, Edgar Meij, Krisztian Balog, Maarten de Rijke
SIGIR ’11 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information, 2011

Dynamic term cloud screenshot

Online Religious Studies

Data transitions have revolutionized many scientific disciplines, starting with the exact sciences, then the life sciences, and now the social sciences and humanities are in the process of making the transition to becoming data intensive sciences, with descriptions through quantitative measurements. New analysis tools, and publicly accessible utterances, opinions, transactions and interactions resulting from widespread Internet and social media usage facilitate new, data-intensive research methods in disciplines that have so far relied on small-scale literature and/or panel-based studies. To illustrate the new possibilities, we report on a pilot carried out by a cross-disciplinary team consisting of computer scientists and researchers in religious studies. In the latter area, research is often focused on mapping out the convictions, hopes, and beliefs of groups of people, be it within certain religions or within any other group, such as those defined by a political party.

In the pilot, religious scholars examined the core keywords in a left-wing political party in order to determine their hopes and beliefs. Rather than following their standard way-of-working, they were equipped with a search engine with an index of content crawled from discussion forums, the party’s web site plus a range of online publications relating to the party and going back to 1990. In this paper we focus on lessons learned and on methodological innovations for religious scholars as well as for computer scientists building the enabling technology.

 • [PDF] J. Bekkenkamp, E. Meij, and M. de Rijke, “Online religious studies,” in Web science 2011, Koblenz, 2011.
  [Bibtex]
  @inproceedings{websci:2011:meij,
  Abstract = {Data transitions have revolutionized many scientific disciplines, starting with the exact sciences, then the life sciences, and now the social sciences and humanities are in the process of making the transition to becoming data intensive sciences, with descriptions through quantitative measurements. New analysis tools and publicly accessible utterances, opinions, transactions and interactions resulting from widespread internet and social media usage facilitate new, data-intensive research methods in disciplines that have so far relied on small-scale literature and/or panel-based studies. To illustrate the new possibilities, we report on a pilot carried out by a cross-disciplinary team consisting of computer scientists and researchers in religious studies. In the latter area, research is often focused on mapping out the convictions, hopes, and beliefs of groups of people, be it within certain religions or within any other group, such as those defined by a political party.
  In the pilot, religious scholars examined the core keywords in a left-wing political party in order to determine their hopes and beliefs. Rather than following their standard way-of- working, they were equipped with a search engine with an index of content crawled from discussion forums, the party‚{\"A}{\^o}s web site plus a range of online publications relating to the party and going back to 1990. In this paper we focus on lessons learned and on methodological innovations for religious scholars as well as for computer scientists building the enabling technology.},
  Address = {Koblenz},
  Author = {Bekkenkamp, J. and Meij, E. and de Rijke, M.},
  Booktitle = {Web Science 2011},
  Date-Added = {2011-10-20 10:49:41 +0200},
  Date-Modified = {2012-10-30 08:39:02 +0000},
  Title = {Online Religious Studies},
  Year = {2011}}
Classifying People Queries

Classifying Queries Submitted to a Vertical Search Engine

We propose and motivate a scheme for classifying queries submitted to a people search engine. We specify a number of features for automatically classifying people queries into the proposed classes and examine the effectiveness of these features. Our main finding is that classification is feasible and that using information from past searches, clickouts and news sources is important.

 • [PDF] R. Berendsen, B. Kovachev, E. Meij, M. de Rijke, and W. Weerkamp, “Classifying queries submitted to a vertical search engine,” in Web science 2011, Koblenz, 2011.
  [Bibtex]
  @inproceedings{websci:2011:berendsen,
  Address = {Koblenz},
  Author = {Berendsen, R. and Kovachev, B. and Meij, E. and de Rijke, M. and Weerkamp, W.},
  Booktitle = {Web Science 2011},
  Date-Added = {2011-10-20 10:49:24 +0200},
  Date-Modified = {2012-10-30 08:39:05 +0000},
  Title = {Classifying Queries Submitted to a Vertical Search Engine},
  Year = {2011}}
Dutch Belgian Information Retrieval Workshop logo

Dir 2011: the eleventh Dutch-Belgian information retrieval workshop

The 11th edition of the annual Dutch-Belgian Information Retrieval workshop (DIR 2011) took place on February 4 in Amsterdam. It was organized by the University of Amsterdam and the Centrum Wiskunde & Informatica. The focus of this year’s workshop was on interaction, with the goal of facilitating and increasing interaction, especially within the local research community, and between industry and academia. The scientific program included demos, research papers, and compressed contributions. The keynotes by Nick Belkin and Gabriella Kazai provided intriguing outlooks on the future of IR evaluation.

 • [PDF] C. Boscarino, K. Hofmann, V. B. Jijkoun, E. Meij, M. de Rijke, and W. Weerkamp, “Workshop report: dutch-belgian information retrieval,” Sigir forum, vol. 45, iss. 1, pp. 42-44, 2011.
  [Bibtex]
  @article{forum:2011:dir,
  Author = {Boscarino, C. and Hofmann, K. and Jijkoun, V.B. and Meij, E. and de Rijke, M. and Weerkamp, W.},
  Chapter = {42},
  Date-Added = {2011-10-20 10:48:47 +0200},
  Date-Modified = {2011-10-20 10:48:52 +0200},
  Journal = {SIGIR Forum},
  Number = {1},
  Pages = {42-44},
  Title = {Workshop report: Dutch-Belgian Information Retrieval},
  Volume = {45},
  Year = {2011}}