ECIR 2017

Generating descriptions of entity relationships

Large-scale knowledge graphs (KGs) store relationships between entities that are increasingly being used to improve the user experience in search applications. The structured nature of the data in KGs is typically not suitable to show to an end user and applications that utilize KGs therefore benefit from human-readable textual descriptions of KG relationships. We present a method that automatically generates textual descriptions of entity relationships by combining textual and KG information. Our method creates sentence templates for a particular relationship and then generates a textual description of a relationship instance by selecting the best template and filling it with appropriate entities. Experimental results show that a supervised variation of our method outperforms other variations as it captures the semantic similarity between a relationship instance and a template best, whilst providing more contextual information.

  • [PDF] N. Voskarides, E. Meij, and M. de Rijke, “Generating descriptions of entity relationships,” in Ecir 2017: 39th european conference on information retrieval, 2017.
    Author = {Voskarides, Nikos and Meij, Edgar and de Rijke, Maarten},
    Booktitle = {ECIR 2017: 39th European Conference on Information Retrieval},
    Date-Added = {2017-01-10 21:27:37 +0000},
    Date-Modified = {2017-01-10 21:27:58 +0000},
    Month = {April},
    Publisher = {Springer},
    Series = {LNCS},
    Title = {Generating descriptions of entity relationships},
    Year = {2017}}
wsdm 2017

Utilizing Knowledge Bases in Text-centric Information Retrieval (WSDM 2017)

The past decade has witnessed the emergence of several publicly available and proprietary knowledge graphs (KGs). The increasing depth and breadth of content in KGs makes them not only rich sources of structured knowledge by themselves but also valuable resources for search systems. A surge of recent developments in entity linking and retrieval methods gave rise to a new line of research that aims at utilizing KGs for text-centric retrieval applications, making this an ideal time to pause and report current findings to the community, summarizing successful approaches, and soliciting new ideas. This tutorial is the first to disseminate the progress in this emerging field to researchers and practitioners.

CIKM 2016

Document Filtering for Long-tail Entities

Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering for popular entities are entity-dependent: they rely on and are also trained on the specifics of differentiating features for each specific entity. Moreover, these approaches tend to use so-called extrinsic information such as Wikipedia page views and related entities which is typically only available only for popular head entities. Entity-dependent approaches based on such signals are therefore ill-suited as filtering methods for long-tail entities. Continue reading “Document Filtering for Long-tail Entities” »

Utilizing Knowledge Bases in Text-centric Information Retrieval (ICTIR 2016)

General-purpose knowledge bases are increasingly growing in terms of depth (content) and width (coverage). Moreover, algorithms for entity linking and entity retrieval have improved tremendously in the past years. These developments give rise to a new line of research that exploits and combines these developments for the purposes of text-centric information retrieval applications. This tutorial focuses on a) how to retrieve a set of entities for an ad-hoc query, or more broadly, assessing relevance of KB elements for the information need, b) how to annotate text with such elements, and c) how to use this information to assess the relevance of text. We discuss different kinds of information available in a knowledge graph and how to leverage each most effectively.
Continue reading “Utilizing Knowledge Bases in Text-centric Information Retrieval (ICTIR 2016)” »

Mining, ranking and recommending entity aspects

Entity queries constitute a large fraction of web search queries and most of these queries are in the form of an entity mention plus some context terms that represent an intent in the context of that entity. We refer to these entity-oriented search intents as entity aspects. Recognizing entity aspects in a query can improve various search applications such as providing direct answers, diversifying search results, and recommending queries. In this paper we focus on the tasks of identifying, ranking, and recommending entity aspects, and propose an approach that mines, clusters, and ranks such aspects from query logs.  Continue reading “Mining, ranking and recommending entity aspects” »

Learning to Explain Entity Relationships in Knowledge Graphs

We study the problem of explaining relationships between pairs of knowledge graph entities with human-readable descriptions. Our method extracts and enriches sentences that refer to an entity pair from a corpus and ranks the sentences according to how well they describe the relationship between the entities. We model this task as a learning to rank problem for sentences and employ a rich set of features. When evaluated on a large set of manually annotated sentences, we find that our method significantly improves over state-of-the-art baseline models.

  • [PDF] N. Voskarides, E. Meij, M. Tsagkias, M. de Rijke, and W. Weerkamp, “Learning to explain entity relationships in knowledge graphs,” in Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), 2015, pp. 564-574.
    Author = {Voskarides, Nikos and Meij, Edgar and Tsagkias, Manos and de Rijke, Maarten and Weerkamp, Wouter},
    Booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
    Date-Added = {2015-08-06 13:08:02 +0000},
    Date-Modified = {2015-08-06 13:08:14 +0000},
    Location = {Beijing, China},
    Pages = {564--574},
    Publisher = {Association for Computational Linguistics},
    Title = {Learning to Explain Entity Relationships in Knowledge Graphs},
    Url = {},
    Year = {2015},
    Bdsk-Url-1 = {}}

Linking queries to entities

I’m happy to announce we’re releasing a new test collection for entity linking for web queries (within user sessions) to Wikipedia. About half of the queries in this dataset are sampled from Yahoo search logs, the other half comes from the TREC Session track. Check out the L24 dataset on Yahoo Webscope, or drop me a line for more information. Below you’ll find an excerpt of the README text associated with it.

With this dataset you can train, test, and benchmark entity linking systems on the task of linking web search queries – within the context of a search session – to entities. Entities are a key enabling component for semantic search, as many information needs can be answered by returning a list of entities, their properties, and/or their relations. A first step in any such scenario is to determine which entities appear in a query – a process commonly referred to as named entity resolution, named entity disambiguation, or semantic linking.

This dataset allows researchers and other practitioners to evaluate their systems for linking web search engine queries to entities. The dataset contains manually identified links to entities in the form of Wikipedia articles and provides the means to train, test, and benchmark such systems using manually created, gold standard data. With releasing this dataset publicly, we aim to foster research into entity linking systems for web search queries. To this end, we also include sessions and queries from the TREC Session track (years 2010–2013). Moreover, since the linked entities are aligned with a specific part of each query (a “span”), this data can also be used to evaluate systems that identify spans in queries, i.e, that perform query segmentation for web search queries, in the context of search sessions.

The key properties of the dataset are as follows.

  • Queries are taken from Yahoo US Web Search and from the TREC Session track (2010-2013).
  • There are 2635 queries in 980 sessions, 7482 spans, and 5964 links to Wikipedia articles in this dataset.
  • The annotations include the part of the query (the “span”) that is linked to each Wikipedia article. This information can also be used for query segmentation experiments.
  • The annotators have identified the “main” entity/ies for each query, if available.
  • The annotators also labeled the queries, identifying whether they are non-English, navigational, quote-or-question, adult, or ambiguous and also if an out-of-Wikipedia entity is mentioned in the query, i.e., when an entity is mentioned in a query but no suitable Wikipedia article exists.
  • The file includes session information: each session consists of an anonymized id, initial query, as well as all the queries issued within the same session and their relative date/timestamp if available.
  • Sessions are demarcated using a 30 minute time-out.

Entity Linking and Retrieval Tutorial @ SIGIR 2013 – Slides, Code, and Bibliography

The material for our “Entity Linking and Retrieval” tutorial (with Krisztian Balog and Daan Odijk) for SIGIR 2013 has been updated and is available online on GitHub (slides), Dropbox (slides), Mendeley, and CodeAcademy. All material is summarized at the webpage for the tutorial: See my other blogpost for a brief summary.

Do support groups members disclose less to their partners? the dynamics of HIV disclosure in four African countries

To appear in BMC Public Health.

Background: Recent efforts to curtail the HIV epidemic in Africa have emphasized preventing sexual transmission to partners through antiretroviral therapy. A component of current strategies is disclosure to partners, thus understanding its motivations will help maximise results. This study examines the rates, dynamics and consequences of partner disclosure in Burkina Faso, Kenya, Malawi and Uganda, with special attention to the role of support groups and stigma in disclosure.

Methods: The study employs mixed methods, including a cross-sectional client survey of counseling and testing services, focus groups, and in-depth interviews with HIV-positive individuals in stable partnerships in Burkina Faso, Kenya, Malawi and Uganda, recruited at healthcare facilities offering HIV testing.

Results: Rates of disclosure to partners varied between countries (32.7% – 92.7%). The lowest rate was reported in Malawi. Reasons for disclosure included preventing the transmission of HIV, the need for care, and upholding the integrity of the relationship. Fear of stigma was an important reason for non-disclosure. Women reported experiencing more negative reactions when disclosing to partners. Disclosure was positively associated with living in urban areas, higher education levels, and being male, while being negatively associated with membership to support groups.

Conclusions: Understanding of reasons for disclosure and recognition of the role of support groups in the process can help improve current prevention efforts, that increasingly focus on treatment as prevention as a way to halt new infections. Support groups can help spread secondary prevention messages, by explaining to their members that antiretroviral treatment has benefits for HIV positive individuals and their partners. Home-based testing can further facilitate partner disclosure, as couples can test together and be counseled jointly.

Semantic TED

Multilingual Semantic Linking for Video Streams: Making “Ideas Worth Sharing” More Accessible

Semantic TEDThis paper describes our (winning!) submission to the Developers Challenge at WoLE2013, “Doing Good by Linking Entities.” We present a fully automatic system – called “Semantic TED” – which provides intelligent suggestions in the form of links to Wikipedia articles for video streams in multiple languages, based on the subtitles that accompany the visual content. The system is applied to online conference talks. In particular, we adapt a recently proposed semantic linking approach for streams of television broadcasts to facilitate generating contextual links while a TED talk is being viewed. TED is a highly popular global conference series covering many research domains; the publicly available talks have accumulated a total view count of over one billion at the time of writing. We exploit the multi-linguality of Wikipedia and the TED subtitles to provide contextual suggestions in the language of the user watching a video. In this way, a vast source of educational and intellectual content is disclosed to a broad audience that might otherwise experience difficulties interpreting it.

  • [PDF] D. Odijk, E. Meij, D. Graus, and T. Kenter, “Multilingual semantic linking for video streams: making "ideas worth sharing" more accessible,” in Proceedings of the 2nd international workshop on web of linked entities (wole 2013), 2013.
    Author = {Odijk, Daan and Meij, Edgar and Graus, David and Kenter, Tom},
    Booktitle = {Proceedings of the 2nd International Workshop on Web of Linked Entities (WoLE 2013)},
    Date-Added = {2013-05-15 14:09:58 +0000},
    Date-Modified = {2013-05-15 14:11:37 +0000},
    Title = {Multilingual Semantic Linking for Video Streams: Making "Ideas Worth Sharing" More Accessible},
    Year = {2013}}