hits per time of day

People searching for people: analysis of a people search engine log

Recent years show an increasing interest in vertical search: searching within a particular type of information. Understanding what people search for in these “verticals” gives direction to research and provides pointers for the search engines themselves. In this paper we analyze the search logs of one particular vertical: people search engines. Based on an extensive analysis of the logs of a search engine geared towards finding people, we propose a classification scheme for people search at three levels: (a) queries, (b) sessions, and (c) users. For queries, we identify three types, (i) event-based high-profile queries (people that become “popular” because of an event happening), (ii) regular high-profile queries (celebrities), and (iii) low-profile queries (other, less-known people). We present experiments on automatic classification of queries. On the session level, we observe five types: (i) family sessions (users looking for relatives), (ii) event sessions (querying the main players of an event), (iii) spotting sessions (trying to “spot” different celebrities online), (iv) polymerous sessions (sessions without a clear relation between queries), and (v) repetitive sessions (query refinement and copying). Finally, for users we identify four types: (i) monitors, (ii) spotters, (iii) followers, and (iv) polymers.

Our findings not only offer insight into search behavior in people search engines, but they are also useful to identify future research directions and to provide pointers for search engine improvements.

  • [PDF] W. Weerkamp, R. Berendsen, B. Kovachev, E. Meij, K. Balog, and M. de Rijke, “People searching for people: analysis of a people search engine log,” in Proceedings of the 34th international acm sigir conference on research and development in information, 2011.
    Author = {Weerkamp, Wouter and Berendsen, Richard and Kovachev, Bogomil and Meij, Edgar and Balog, Krisztian and de Rijke, Maarten},
    Booktitle = {Proceedings of the 34th international ACM SIGIR conference on Research and development in Information},
    Date-Added = {2011-10-20 10:50:25 +0200},
    Date-Modified = {2012-10-30 08:41:27 +0000},
    Series = {SIGIR 2011},
    Title = {People searching for people: analysis of a people search engine log},
    Year = {2011},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/2009916.2009927}}

ACM DL Author-ize servicePeople searching for people: analysis of a people search engine log

Wouter Weerkamp, Richard Berendsen, Bogomil Kovachev, Edgar Meij, Krisztian Balog, Maarten de Rijke
SIGIR ’11 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information, 2011


Investigating the Demand Side of Semantic Search through Query Log Analysis

Semantic search is by its broadest definition a collection of approaches that aim at matching the Web’s content with the information need of Web users at a semantic level. Most of the work in this area has focused on the supply-side of semantic search, in particular elevating Web content to the semantic level by relying on methods of information extraction or working with explicit metadata embedded inside or linked to Web resources. With respect to explicit metadata, several studies have been done on the adoption of semantic web formats in the wild, mostly based on statistics from the crawls of semantic web search engines. Much less effort has focused on the demand-side of semantic search, i.e. interpreting queries at the semantic level and studying information needs at this level. Conversely, little is known as to how much the supply of metadata actually matches the demand for information on the Web.

In this paper, we address the problem of studying the information need of Web searchers at an ontological level, i.e., in terms of the particular attributes of objects they are interested in. We describe a set of methods for extracting the context words to certain classes of objects from a Web search query log. We do so based on the idea that common context words reflects aspects of objects users are interested in. We implement these methods in an interactive tool called the Semantic Search Assist. The original purpose of this tool was to generate type-based query suggestions when there is not enough statistical evidence for entity-based query suggestions. However, from an ontology engineering perspective, this tool answers the question of what attributes a class of objects would have if the ontology for it was engineered purely based on the information needs of end users. As such it allows us to reflect on the gap between the properties defined in Semantic Web ontologies and the attributes of objects that people are searching for on the Web. We evaluate our tool by measuring it’s predictive power on the query log itself. We leave the study of the gap between particular information needs and Semantic Web data for future work.

  • [PDF] E. Meij, P. Mika, and H. Zaragoza, “Investigating the demand side of semantic search through query log analysis,” in Proceedings of the workshop on semantic search (semsearch 2009) at the 18th international world wide web conference (www 2009), 2009.
    Author = {Edgar Meij and P. Mika and H. Zaragoza},
    Booktitle = {Proceedings of the Workshop on Semantic Search (SemSearch 2009) at the 18th International World Wide Web Conference (WWW 2009)},
    Date-Added = {2011-10-12 18:31:55 +0200},
    Date-Modified = {2012-10-30 08:43:47 +0000},
    Title = {Investigating the Demand Side of Semantic Search through Query Log Analysis},
    Year = {2009}}