WSDM

Dynamic Collective Entity Representations for Entity Ranking

Entity ranking, i.e., successfully positioning a relevant entity at the top of the ranking for a given query, is inherently difficult due to the potential mismatch between the entity’s description in a knowledge base, and the way people refer to the entity when searching for it. To counter this issue we propose a method for constructing dynamic collective entity representations. We collect entity descriptions from a variety of sources and combine them into a single entity representation by learning to weight the content from different sources that are associated with an entity for optimal retrieval effectiveness. Our method is able to add new descriptions in real time and learn the best representation as time evolves so as to capture the dynamics of how people search entities. Incorporating dynamic description sources into dynamic collective entity representations improves retrieval effectiveness by 7% over a state-of-the-art learning to rank baseline. Periodic retraining of the ranker enables higher ranking effectiveness for dynamic collective entity representations.

  • [PDF] D. Graus, M. Tsagkias, W. Weerkamp, E. Meij, and M. de Rijke, “Dynamic collective entity representations for entity ranking,” in Proceedings of the ninth acm international conference on web search and data mining, 2016.
    [Bibtex]
    @inproceedings{WSDM:2016:Graus,
    Author = {Graus, David and Tsagkias, Manos and Weerkamp, Wouter and Meij, Edgar and de Rijke, Maarten},
    Booktitle = {Proceedings of the ninth ACM international conference on Web search and data mining},
    Date-Added = {2016-01-07 17:24:16 +0000},
    Date-Modified = {2016-01-07 17:25:55 +0000},
    Series = {WSDM 2016},
    Title = {Dynamic Collective Entity Representations for Entity Ranking},
    Year = {2016},
    Bdsk-Url-1 = {http://aclweb.org/anthology/P15-1055}}

Mining, ranking and recommending entity aspects

Entity queries constitute a large fraction of web search queries and most of these queries are in the form of an entity mention plus some context terms that represent an intent in the context of that entity. We refer to these entity-oriented search intents as entity aspects. Recognizing entity aspects in a query can improve various search applications such as providing direct answers, diversifying search results, and recommending queries. In this paper we focus on the tasks of identifying, ranking, and recommending entity aspects, and propose an approach that mines, clusters, and ranks such aspects from query logs.  Continue reading “Mining, ranking and recommending entity aspects” »

Dynamic query modeling for related content finding

While watching television, people increasingly consume additional content related to what they are watching. We consider the task of finding video content related to a live television broadcast for which we leverage the textual stream of subtitles associated with the broadcast. We model this task as a Markov decision process and propose a method that uses reinforcement learning to directly optimize the retrieval effectiveness of queries generated from the stream of subtitles. Our dynamic query modeling approach significantly outperforms state-of-the-art baselines for stationary query modeling and for text-based retrieval in a television setting. In particular we find that carefully weighting terms and decaying these weights based on recency significantly improves effectiveness. Moreover, our method is highly efficient and can be used in a live television setting, i.e., in near real time.

  • [PDF] D. Odijk, E. Meij, I. Sijaranamual, and M. de Rijke, “Dynamic query modeling for related content finding,” in SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval, 2015.
    [Bibtex]
    @inproceedings{SIGIR:2015:Odijk,
    Author = {Odijk, Daan and Meij, Edgar and Sijaranamual, Isaac and de Rijke, Maarten},
    Booktitle = {{SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval}},
    Date-Added = {2015-08-06 13:14:13 +0000},
    Date-Modified = {2015-08-06 13:39:24 +0000},
    Month = {August},
    Publisher = {ACM},
    Title = {Dynamic query modeling for related content finding},
    Year = {2015}}

Learning to Explain Entity Relationships in Knowledge Graphs

We study the problem of explaining relationships between pairs of knowledge graph entities with human-readable descriptions. Our method extracts and enriches sentences that refer to an entity pair from a corpus and ranks the sentences according to how well they describe the relationship between the entities. We model this task as a learning to rank problem for sentences and employ a rich set of features. When evaluated on a large set of manually annotated sentences, we find that our method significantly improves over state-of-the-art baseline models.

  • [PDF] N. Voskarides, E. Meij, M. Tsagkias, M. de Rijke, and W. Weerkamp, “Learning to explain entity relationships in knowledge graphs,” in Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), 2015, p. 564–574.
    [Bibtex]
    @inproceedings{ACL:2015:Voskarides,
    Author = {Voskarides, Nikos and Meij, Edgar and Tsagkias, Manos and de Rijke, Maarten and Weerkamp, Wouter},
    Booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
    Date-Added = {2015-08-06 13:08:02 +0000},
    Date-Modified = {2015-08-06 13:08:14 +0000},
    Location = {Beijing, China},
    Pages = {564--574},
    Publisher = {Association for Computational Linguistics},
    Title = {Learning to Explain Entity Relationships in Knowledge Graphs},
    Url = {http://aclweb.org/anthology/P15-1055},
    Year = {2015},
    Bdsk-Url-1 = {http://aclweb.org/anthology/P15-1055}}

Fast and Space-Efficient Entity Linking in Queries

Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds. In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredients that make the algorithm fast and space-efficient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O(k^2) implementation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model. We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times—at least two orders of magnitude faster than existing systems.

  • [PDF] R. Blanco, G. Ottaviano, and E. Meij, “Fast and space-efficient entity linking in queries,” in Proceedings of the eighth acm international conference on web search and data mining, 2015.
    [Bibtex]
    @inproceedings{WSDM:2015:blanco,
    Author = {Blanco, Roi and Ottaviano, Giuseppe and Meij, Edgar},
    Booktitle = {Proceedings of the eighth ACM international conference on Web search and data mining},
    Date-Added = {2011-10-26 11:21:51 +0200},
    Date-Modified = {2015-01-20 20:29:19 +0000},
    Series = {WSDM 2015},
    Title = {Fast and Space-Efficient Entity Linking in Queries},
    Year = {2015},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/1935826.1935842}}
CIKM 2014

Time-Aware Rank Aggregation for Microblog Search

We tackle the problem of searching microblog posts and frame it as a rank aggregation problem where we merge result lists generated by separate rankers so as to produce a final ranking to be returned to the user. We propose a rank aggregation method, TimeRA, that is able to infer the rank scores of documents via latent factor modeling. It is time-aware and rewards posts that are published in or near a burst of posts that are ranked highly in many of the lists being aggregated. Our experimental results show that it significantly outperforms state-of-the-art rank aggregation and time-sensitive microblog search algorithms.

Time-Aware Chi-squared for Document Filtering over Time

To appear at TAIA2013 (a SIGIR 2013 workshop).

Document filtering over time is widely applied in various tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In “streaming” settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson’s χ2 test for feature selection and demonstrate its benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting schemes. Experiments show improvements of our approach over a non-adaptive baseline.

Do support groups members disclose less to their partners? the dynamics of HIV disclosure in four African countries

To appear in BMC Public Health.

Background: Recent efforts to curtail the HIV epidemic in Africa have emphasized preventing sexual transmission to partners through antiretroviral therapy. A component of current strategies is disclosure to partners, thus understanding its motivations will help maximise results. This study examines the rates, dynamics and consequences of partner disclosure in Burkina Faso, Kenya, Malawi and Uganda, with special attention to the role of support groups and stigma in disclosure.

Methods: The study employs mixed methods, including a cross-sectional client survey of counseling and testing services, focus groups, and in-depth interviews with HIV-positive individuals in stable partnerships in Burkina Faso, Kenya, Malawi and Uganda, recruited at healthcare facilities offering HIV testing.

Results: Rates of disclosure to partners varied between countries (32.7% – 92.7%). The lowest rate was reported in Malawi. Reasons for disclosure included preventing the transmission of HIV, the need for care, and upholding the integrity of the relationship. Fear of stigma was an important reason for non-disclosure. Women reported experiencing more negative reactions when disclosing to partners. Disclosure was positively associated with living in urban areas, higher education levels, and being male, while being negatively associated with membership to support groups.

Conclusions: Understanding of reasons for disclosure and recognition of the role of support groups in the process can help improve current prevention efforts, that increasingly focus on treatment as prevention as a way to halt new infections. Support groups can help spread secondary prevention messages, by explaining to their members that antiretroviral treatment has benefits for HIV positive individuals and their partners. Home-based testing can further facilitate partner disclosure, as couples can test together and be counseled jointly.

Semantic TED

Multilingual Semantic Linking for Video Streams: Making “Ideas Worth Sharing” More Accessible

Semantic TEDThis paper describes our (winning!) submission to the Developers Challenge at WoLE2013, “Doing Good by Linking Entities.” We present a fully automatic system – called “Semantic TED” – which provides intelligent suggestions in the form of links to Wikipedia articles for video streams in multiple languages, based on the subtitles that accompany the visual content. The system is applied to online conference talks. In particular, we adapt a recently proposed semantic linking approach for streams of television broadcasts to facilitate generating contextual links while a TED talk is being viewed. TED is a highly popular global conference series covering many research domains; the publicly available talks have accumulated a total view count of over one billion at the time of writing. We exploit the multi-linguality of Wikipedia and the TED subtitles to provide contextual suggestions in the language of the user watching a video. In this way, a vast source of educational and intellectual content is disclosed to a broad audience that might otherwise experience difficulties interpreting it.

  • [PDF] D. Odijk, E. Meij, D. Graus, and T. Kenter, “Multilingual semantic linking for video streams: making “ideas worth sharing” more accessible,” in Proceedings of the 2nd international workshop on web of linked entities (wole 2013), 2013.
    [Bibtex]
    @inproceedings{WOLE:2013:Odijk,
    Author = {Odijk, Daan and Meij, Edgar and Graus, David and Kenter, Tom},
    Booktitle = {Proceedings of the 2nd International Workshop on Web of Linked Entities (WoLE 2013)},
    Date-Added = {2013-05-15 14:09:58 +0000},
    Date-Modified = {2013-05-15 14:11:37 +0000},
    Title = {Multilingual Semantic Linking for Video Streams: Making "Ideas Worth Sharing" More Accessible},
    Year = {2013}}