Weakly-supervised Contextualization of Knowledge Graph Facts

Knowledge graphs (KGs) model facts about the world; they consist of nodes (entities such as companies and people) that are connected by edges (relations such as founderOf ). Facts encoded in KGs are frequently used by search applications to augment result pages. When presenting a KG fact to the user, providing other facts that are pertinent to that main fact can enrich the user experience and support exploratory information needs. KG fact contextualization is the task of augmenting a given KG fact with additional and useful KG facts. The task is challenging because of the large size of KGs; discovering other relevant facts even in a small neighborhood of the given fact results in an enormous amount of candidates. We introduce a neural fact contextualization method (NFCM) to address the KG fact contextualization task. NFCM first generates a set of candidate facts in the neighborhood of a given fact and then ranks the candidate facts using a supervised learning to rank model. The ranking model combines features that we automatically learn from data and that represent the query-candidate facts with a set of hand-crafted features we devised or adjusted for this task. In order to obtain the annotations required to train the learning to rank model at scale, we generate training data automatically using distant supervision on a large entity-tagged text corpus. We show that ranking functions learned on this data are effective at contextualizing KG facts. Evaluation using human assessors shows that it significantly outperforms several competitive baselines.

  • [DOI] N. Voskarides, E. Meij, R. Reinanda, A. Khaitan, M. Osborne, G. Stefanoni, P. Kambadur, and M. de Rijke, “Weakly-supervised contextualization of knowledge graph facts,” in The 41st international acm sigir conference on research & development in information retrieval, New York, NY, USA, 2018, p. 765–774.
    [Bibtex]
    @inproceedings{SIGIR:2018:Voskarides,
    Acmid = {3210031},
    Address = {New York, NY, USA},
    Author = {Voskarides, Nikos and Meij, Edgar and Reinanda, Ridho and Khaitan, Abhinav and Osborne, Miles and Stefanoni, Giorgio and Kambadur, Prabhanjan and de Rijke, Maarten},
    Booktitle = {The 41st International ACM SIGIR Conference on Research \& Development in Information Retrieval},
    Date-Added = {2018-07-26 18:23:41 +0000},
    Date-Modified = {2018-07-26 18:31:57 +0000},
    Doi = {10.1145/3209978.3210031},
    Isbn = {978-1-4503-5657-2},
    Keywords = {distant supervision, fact contextualization, knowledge graphs},
    Location = {Ann Arbor, MI, USA},
    Numpages = {10},
    Pages = {765--774},
    Publisher = {ACM},
    Series = {SIGIR '18},
    Title = {Weakly-supervised Contextualization of Knowledge Graph Facts},
    Url = {http://doi.acm.org/10.1145/3209978.3210031},
    Year = {2018},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/3209978.3210031},
    Bdsk-Url-2 = {https://doi.org/10.1145/3209978.3210031}}

The Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR)

Semantic technologies such as controlled vocabularies, thesauri, and knowledge graphs have been used throughout the history of information retrieval for a variety of tasks. Recent advances in knowledge acquisition, alignment, and utilization have given rise to a body of new approaches for utilizing knowledge graphs in text retrieval tasks and it is therefore time to consolidate the community efforts and study how such technologies can be employed in information retrieval systems in the most effective way. It is also time to start and deepen the dialogue between researchers and practitioners in order to ensure that breakthroughs, technologies, and algorithms in this space are widely disseminated. The goal of this workshop, co-located with SIGIR 2018, is to bring together and grow a community of researchers and practitioners who are interested in using, aligning, and constructing knowledge graphs and similar semantic resources for information retrieval applications. See https://kg4ir.github.io/ for more info.

  • [DOI] L. Dietz, C. Xiong, J. Dalton, and E. Meij, “The second workshop on knowledge graphs and semantics for text retrieval, analysis, and understanding (kg4ir),” in The 41st international acm sigir conference on research & development in information retrieval, New York, NY, USA, 2018, p. 1423–1426.
    [Bibtex]
    @inproceedings{SIGIR:2018:Dietz-WS,
    Acmid = {3210196},
    Address = {New York, NY, USA},
    Author = {Dietz, Laura and Xiong, Chenyan and Dalton, Jeff and Meij, Edgar},
    Booktitle = {The 41st International ACM SIGIR Conference on Research \& Development in Information Retrieval},
    Date-Added = {2018-07-26 18:25:34 +0000},
    Date-Modified = {2018-07-26 18:31:50 +0000},
    Doi = {10.1145/3209978.3210196},
    Isbn = {978-1-4503-5657-2},
    Keywords = {entity linking, entity retrieval, entity-oriented search, information retrieval, knowledge graphs},
    Location = {Ann Arbor, MI, USA},
    Numpages = {4},
    Pages = {1423--1426},
    Publisher = {ACM},
    Series = {SIGIR '18},
    Title = {The Second Workshop on Knowledge Graphs and Semantics for Text Retrieval, Analysis, and Understanding (KG4IR)},
    Url = {http://doi.acm.org/10.1145/3209978.3210196},
    Year = {2018},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/3209978.3210196},
    Bdsk-Url-2 = {https://doi.org/10.1145/3209978.3210196}}

Utilizing Knowledge Graphs for Text-Centric Information Retrieval

The past decade has witnessed the emergence of several publicly available and proprietary knowledge graphs (KGs). The depth and breadth of content in these KGs made them not only rich sources of structured knowledge by themselves, but also valuable resources for search systems. A surge of recent developments in entity linking and entity retrieval methods gave rise to a new line of research that aims at utilizing KGs for text-centric retrieval applications. This tutorial is the first to summarize and disseminate the progress in this emerging area to industry practitioners and researchers.

  • [DOI] L. Dietz, A. Kotov, and E. Meij, “Utilizing knowledge graphs for text-centric information retrieval,” in The 41st international acm sigir conference on research & development in information retrieval, New York, NY, USA, 2018, p. 1387–1390.
    [Bibtex]
    @inproceedings{SIGIR:2018:Dietz-Tut,
    Acmid = {3210187},
    Address = {New York, NY, USA},
    Author = {Dietz, Laura and Kotov, Alexander and Meij, Edgar},
    Booktitle = {The 41st International ACM SIGIR Conference on Research \& Development in Information Retrieval},
    Date-Added = {2018-07-26 18:24:31 +0000},
    Date-Modified = {2018-07-26 18:31:50 +0000},
    Doi = {10.1145/3209978.3210187},
    Isbn = {978-1-4503-5657-2},
    Keywords = {entity linking, entity retrieval, information retrieval, knowledge graphs},
    Location = {Ann Arbor, MI, USA},
    Numpages = {4},
    Pages = {1387--1390},
    Publisher = {ACM},
    Series = {SIGIR '18},
    Title = {Utilizing Knowledge Graphs for Text-Centric Information Retrieval},
    Url = {http://doi.acm.org/10.1145/3209978.3210187},
    Year = {2018},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/3209978.3210187},
    Bdsk-Url-2 = {https://doi.org/10.1145/3209978.3210187}}

The First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis (KG4IR)

Knowledge graphs have been used throughout the history of information retrieval for a variety of tasks. Advances in knowledge acquisition and alignment technology in the last few years have given rise to a body of new approaches for utilizing knowledge graphs in text retrieval tasks. This report presents the motivation, output, and outlook of the first workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis which was co-located with SIGIR 2017 in Tokyo, Japan. We aim to assess where we stand today, what future directions are, and which preconditions could lead to further performance increases. See https://kg4ir.github.io/ for more info.

  • [DOI] L. Dietz, C. Xiong, and E. Meij, “The first workshop on knowledge graphs and semantics for text retrieval and analysis (kg4ir),” in Proceedings of the 40th international acm sigir conference on research and development in information retrieval, New York, NY, USA, 2017, p. 1427–1428.
    [Bibtex]
    @inproceedings{SIGIR:2017:Dietz,
    Acmid = {3084371},
    Address = {New York, NY, USA},
    Author = {Dietz, Laura and Xiong, Chenyan and Meij, Edgar},
    Booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval},
    Date-Added = {2018-07-26 18:17:39 +0000},
    Date-Modified = {2018-07-26 18:17:51 +0000},
    Doi = {10.1145/3077136.3084371},
    Isbn = {978-1-4503-5022-8},
    Keywords = {entities, information retrieval, knowledge graphs},
    Location = {Shinjuku, Tokyo, Japan},
    Numpages = {2},
    Pages = {1427--1428},
    Publisher = {ACM},
    Series = {SIGIR '17},
    Title = {The First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis (KG4IR)},
    Url = {http://doi.acm.org/10.1145/3077136.3084371},
    Year = {2017},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/3077136.3084371},
    Bdsk-Url-2 = {https://doi.org/10.1145/3077136.3084371}}
ECIR 2017

Generating descriptions of entity relationships

Large-scale knowledge graphs (KGs) store relationships between entities that are increasingly being used to improve the user experience in search applications. The structured nature of the data in KGs is typically not suitable to show to an end user and applications that utilize KGs therefore benefit from human-readable textual descriptions of KG relationships. We present a method that automatically generates textual descriptions of entity relationships by combining textual and KG information. Our method creates sentence templates for a particular relationship and then generates a textual description of a relationship instance by selecting the best template and filling it with appropriate entities. Experimental results show that a supervised variation of our method outperforms other variations as it captures the semantic similarity between a relationship instance and a template best, whilst providing more contextual information.

  • [PDF] N. Voskarides, E. Meij, and M. de Rijke, “Generating descriptions of entity relationships,” in Ecir 2017: 39th european conference on information retrieval, 2017.
    [Bibtex]
    @inproceedings{ECIR:2017:voskarides,
    Author = {Voskarides, Nikos and Meij, Edgar and de Rijke, Maarten},
    Booktitle = {ECIR 2017: 39th European Conference on Information Retrieval},
    Date-Added = {2017-01-10 21:27:37 +0000},
    Date-Modified = {2017-01-10 21:27:58 +0000},
    Month = {April},
    Publisher = {Springer},
    Series = {LNCS},
    Title = {Generating descriptions of entity relationships},
    Year = {2017}}
wsdm 2017

Utilizing Knowledge Bases in Text-centric Information Retrieval (WSDM 2017)

The past decade has witnessed the emergence of several publicly available and proprietary knowledge graphs (KGs). The increasing depth and breadth of content in KGs makes them not only rich sources of structured knowledge by themselves but also valuable resources for search systems. A surge of recent developments in entity linking and retrieval methods gave rise to a new line of research that aims at utilizing KGs for text-centric retrieval applications, making this an ideal time to pause and report current findings to the community, summarizing successful approaches, and soliciting new ideas. This tutorial is the first to disseminate the progress in this emerging field to researchers and practitioners.

CIKM 2016

Document Filtering for Long-tail Entities

Filtering relevant documents with respect to entities is an essential task in the context of knowledge base construction and maintenance. It entails processing a time-ordered stream of documents that might be relevant to an entity in order to select only those that contain vital information. State-of-the-art approaches to document filtering for popular entities are entity-dependent: they rely on and are also trained on the specifics of differentiating features for each specific entity. Moreover, these approaches tend to use so-called extrinsic information such as Wikipedia page views and related entities which is typically only available only for popular head entities. Entity-dependent approaches based on such signals are therefore ill-suited as filtering methods for long-tail entities. Continue reading “Document Filtering for Long-tail Entities” »

Utilizing Knowledge Bases in Text-centric Information Retrieval (ICTIR 2016)

General-purpose knowledge bases are increasingly growing in terms of depth (content) and width (coverage). Moreover, algorithms for entity linking and entity retrieval have improved tremendously in the past years. These developments give rise to a new line of research that exploits and combines these developments for the purposes of text-centric information retrieval applications. This tutorial focuses on a) how to retrieve a set of entities for an ad-hoc query, or more broadly, assessing relevance of KB elements for the information need, b) how to annotate text with such elements, and c) how to use this information to assess the relevance of text. We discuss different kinds of information available in a knowledge graph and how to leverage each most effectively.
Continue reading “Utilizing Knowledge Bases in Text-centric Information Retrieval (ICTIR 2016)” »

WSDM

Dynamic Collective Entity Representations for Entity Ranking

Entity ranking, i.e., successfully positioning a relevant entity at the top of the ranking for a given query, is inherently difficult due to the potential mismatch between the entity’s description in a knowledge base, and the way people refer to the entity when searching for it. To counter this issue we propose a method for constructing dynamic collective entity representations. We collect entity descriptions from a variety of sources and combine them into a single entity representation by learning to weight the content from different sources that are associated with an entity for optimal retrieval effectiveness. Our method is able to add new descriptions in real time and learn the best representation as time evolves so as to capture the dynamics of how people search entities. Incorporating dynamic description sources into dynamic collective entity representations improves retrieval effectiveness by 7% over a state-of-the-art learning to rank baseline. Periodic retraining of the ranker enables higher ranking effectiveness for dynamic collective entity representations.

  • [PDF] D. Graus, M. Tsagkias, W. Weerkamp, E. Meij, and M. de Rijke, “Dynamic collective entity representations for entity ranking,” in Proceedings of the ninth acm international conference on web search and data mining, 2016.
    [Bibtex]
    @inproceedings{WSDM:2016:Graus,
    Author = {Graus, David and Tsagkias, Manos and Weerkamp, Wouter and Meij, Edgar and de Rijke, Maarten},
    Booktitle = {Proceedings of the ninth ACM international conference on Web search and data mining},
    Date-Added = {2016-01-07 17:24:16 +0000},
    Date-Modified = {2016-01-07 17:25:55 +0000},
    Series = {WSDM 2016},
    Title = {Dynamic Collective Entity Representations for Entity Ranking},
    Year = {2016},
    Bdsk-Url-1 = {http://aclweb.org/anthology/P15-1055}}

Mining, ranking and recommending entity aspects

Entity queries constitute a large fraction of web search queries and most of these queries are in the form of an entity mention plus some context terms that represent an intent in the context of that entity. We refer to these entity-oriented search intents as entity aspects. Recognizing entity aspects in a query can improve various search applications such as providing direct answers, diversifying search results, and recommending queries. In this paper we focus on the tasks of identifying, ranking, and recommending entity aspects, and propose an approach that mines, clusters, and ranks such aspects from query logs.  Continue reading “Mining, ranking and recommending entity aspects” »