Utilizing Knowledge Bases in Text-centric Information Retrieval (ICTIR 2016)
General-purpose knowledge bases are increasingly growing in terms of depth (content) and width (coverage). Moreover, algorithms for entity linking and entity retrieval have improved tremendously in the past years. These developments give rise to a new line of research that exploits and combines these developments for the purposes of text-centric information retrieval applications. This tutorial focuses on a) how to retrieve a set of entities for an ad-hoc query, or more broadly, assessing relevance of KB elements for the information need, b) how to annotate text with such elements, and c) how to use this information to assess the relevance of text. We discuss different kinds of information available in a knowledge graph and how to leverage each most effectively.
We start the tutorial with a brief overview of different types of knowledge bases, their structure and information contained in popular general-purpose and domain-specific knowledge bases. In particular, we focus on the representation of entity-centric information in the knowledge base through names, terms, relations, and type taxonomies. Next, we will provide a recap on ad-hoc object retrieval from knowledge graphs as well as entity linking and retrieval. This is essential technology, which the remainder of the tutorial builds on. Next we will cover essential components within successful entity linking systems, including the collection of entity name information and techniques for disambiguation with contextual entity mentions. We will present the details of four previously proposed systems that successfully leverage knowledge bases to improve ad-hoc document retrieval. These systems combine the notion of entity retrieval and semantic search on one hand, with text retrieval models and entity linking on the other. Finally, we also touch on entity aspects and links in the knowledge graph as it can help to understand the entities’ context.
This tutorial is the first to compile, summarize, and disseminate progress in this emerging area and we provide both an overview of state-of-the-art methods and outline open research problems to encourage new contributions.