Special issue on knowledge graphs and semantics in text analysis and retrieval

Knowledge graphs are an effective way to store semantics in a structured format that is easily used by computer systems. In the past few decades, work across different research communities led to scalable knowledge acquisition techniques for building large-scale knowledge graphs. The result is the emergence of large publicly available knowledge graphs (KGs) such as Wikidata, DBpedia, Freebase, and others. While knowledge graphs are designed to support a wide set of different applications, this special issue focuses on the use case of text retrieval and analysis.

Utilizing knowledge graphs for text analysis requires effective alignment techniques that associate segments of unstructured text with entries in the knowledge graph, for example using entity extraction and linking algorithms. A wide range of approaches that combine query-document representations and machine learning repeatedly demonstrate significant improvements for such tasks across diverse domains. The goal of this special issue is to summarize recent progress in research and practice in constructing, grounding, and utilizing knowledge graphs and similar semantic resources for text retrieval and analysis applications. The scope includes acquisition, alignment, and utilization of knowledge graphs and other semantic resources for the purpose of optimizing end-to-end performance of information retrieval systems.

For this special issue we selected six articles out of 23 submissions. Each article was reviewed by at least three reviewers and underwent at least one revision. More literature on how to effectively use of knowledge graphs in information retrieval can be found in the proceedings of the KG4IR Workshop series.

  • [PDF] [DOI] L. Dietz, C. Xiong, J. Dalton, and E. Meij, “Special issue on knowledge graphs and semantics in text analysis and retrieval,” Information retrieval journal, 2019.
    [Bibtex]
    @article{IRJ:2019:Dietz,
    Author = {Dietz, Laura and Xiong, Chenyan and Dalton, Jeff and Meij, Edgar},
    Date-Added = {2019-03-12 20:19:31 +0000},
    Date-Modified = {2019-03-12 20:19:39 +0000},
    Day = {04},
    Doi = {10.1007/s10791-019-09354-z},
    Issn = {1573-7659},
    Journal = {Information Retrieval Journal},
    Month = {Mar},
    Title = {Special issue on knowledge graphs and semantics in text analysis and retrieval},
    Url = {https://doi.org/10.1007/s10791-019-09354-z},
    Year = {2019},
    Bdsk-Url-1 = {https://doi.org/10.1007/s10791-019-09354-z}}

Overview of The First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis (KG4IR)

Knowledge graphs have been used throughout the history of information retrieval for a variety of tasks. Advances in knowledge acquisition and alignment technology in the last few years have given rise to a body of new approaches for utilizing knowledge graphs in text retrieval tasks. This report presents the motivation, output, and outlook of the first workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis which was co-located with SIGIR 2017 in Tokyo, Japan. We aim to assess where we stand today, what future directions are, and which preconditions could lead to further performance increases.

  • [PDF] [DOI] L. Dietz, C. Xiong, and E. Meij, “Overview of the first workshop on knowledge graphs and semantics for text retrieval and analysis (kg4ir),” Sigir forum, vol. 51, iss. 3, p. 139–144, 2018.
    [Bibtex]
    @article{Forum:2018:Dietz,
    Acmid = {3190601},
    Address = {New York, NY, USA},
    Author = {Dietz, Laura and Xiong, Chenyan and Meij, Edgar},
    Date-Added = {2018-07-26 18:22:37 +0000},
    Date-Modified = {2018-07-26 18:22:48 +0000},
    Doi = {10.1145/3190580.3190601},
    Issn = {0163-5840},
    Issue_Date = {December 2017},
    Journal = {SIGIR Forum},
    Month = 2,
    Number = {3},
    Numpages = {6},
    Pages = {139--144},
    Publisher = {ACM},
    Title = {Overview of The First Workshop on Knowledge Graphs and Semantics for Text Retrieval and Analysis (KG4IR)},
    Url = {http://doi.acm.org/10.1145/3190580.3190601},
    Volume = {51},
    Year = {2018},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/3190580.3190601},
    Bdsk-Url-2 = {https://doi.org/10.1145/3190580.3190601}}

Do support groups members disclose less to their partners? the dynamics of HIV disclosure in four African countries

To appear in BMC Public Health.

Background: Recent efforts to curtail the HIV epidemic in Africa have emphasized preventing sexual transmission to partners through antiretroviral therapy. A component of current strategies is disclosure to partners, thus understanding its motivations will help maximise results. This study examines the rates, dynamics and consequences of partner disclosure in Burkina Faso, Kenya, Malawi and Uganda, with special attention to the role of support groups and stigma in disclosure.

Methods: The study employs mixed methods, including a cross-sectional client survey of counseling and testing services, focus groups, and in-depth interviews with HIV-positive individuals in stable partnerships in Burkina Faso, Kenya, Malawi and Uganda, recruited at healthcare facilities offering HIV testing.

Results: Rates of disclosure to partners varied between countries (32.7% – 92.7%). The lowest rate was reported in Malawi. Reasons for disclosure included preventing the transmission of HIV, the need for care, and upholding the integrity of the relationship. Fear of stigma was an important reason for non-disclosure. Women reported experiencing more negative reactions when disclosing to partners. Disclosure was positively associated with living in urban areas, higher education levels, and being male, while being negatively associated with membership to support groups.

Conclusions: Understanding of reasons for disclosure and recognition of the role of support groups in the process can help improve current prevention efforts, that increasingly focus on treatment as prevention as a way to halt new infections. Support groups can help spread secondary prevention messages, by explaining to their members that antiretroviral treatment has benefits for HIV positive individuals and their partners. Home-based testing can further facilitate partner disclosure, as couples can test together and be counseled jointly.

Screenshot of the analysis webtool

Women’s views on consent, counseling and confidentiality in Pmtct: a mixed-methods study in four African countries

Accepted subject to revisions.

Ambitious UN goals to reduce the mother-to-child transmission of HIV have not been met in much of Sub-Saharan Africa. This paper focuses on the quality of information provision and counseling and disclosure patterns in Burkina Faso, Kenya, Malawi and Uganda to identify how services can be improved to enable better PMTCT outcomes.

DBpedia

Mapping queries to the Linking Open Data cloud: A case study using DBpedia

We introduce the task of mapping search engine queries to DBpedia, a major linking hub in the Linking Open Data cloud. We propose and compare various methods for addressing this task, using a mixture of information retrieval and machine learning techniques. Specifically, we present a supervised machine learning-based method to determine which concepts are intended by a user issuing a query. The concepts are obtained from an ontology and may be used to provide contextual information, related concepts, or navigational suggestions to the user submitting the query. Our approach first ranks candidate concepts using a language modeling for information retrieval framework. We then extract query, concept, and search-history feature vectors for these concepts. Using manual annotations we inform a machine learning algorithm that learns how to select concepts from the candidates given an input query. Simply performing a lexical match between the queries and concepts is found to perform poorly and so does using retrieval alone, i.e., omitting the concept selection stage. Our proposed method significantly improves upon these baselines and we find that support vector machines are able to achieve the best performance out of the machine learning algorithms evaluated.

  • [PDF] [DOI] E. Meij, M. Bron, L. Hollink, B. Huurnink, and M. de Rijke, “Mapping queries to the Linking Open Data cloud: a case study using DBpedia,” Web semantics: science, services and agents on the world wide web, vol. 9, iss. 4, pp. 418-433, 2011.
    [Bibtex]
    @article{JWS:2011:meij,
    Abstract = {We introduce the task of mapping search engine queries to DBpedia, a major linking hub in the Linking Open Data cloud. We propose and compare various methods for addressing this task, using a mixture of information retrieval and machine learning techniques. Specifically, we present a supervised machine learning-based method to determine which concepts are intended by a user issuing a query. The concepts are obtained from an ontology and may be used to provide contextual information, related concepts, or navigational suggestions to the user submitting the query. Our approach first ranks candidate concepts using a language modeling for information retrieval framework. We then extract query, concept, and search-history feature vectors for these concepts. Using manual annotations we inform a machine learning algorithm that learns how to select concepts from the candidates given an input query. Simply performing a lexical match between the queries and concepts is found to perform poorly and so does using retrieval alone, i.e., omitting the concept selection stage. Our proposed method significantly improves upon these baselines and we find that support vector machines are able to achieve the best performance out of the machine learning algorithms evaluated.},
    Author = {Edgar Meij and Marc Bron and Laura Hollink and Bouke Huurnink and Maarten de Rijke},
    Date-Added = {2011-11-25 08:45:19 +0100},
    Date-Modified = {2012-10-28 21:59:08 +0000},
    Doi = {10.1016/j.websem.2011.04.001},
    Issn = {1570-8268},
    Journal = {Web Semantics: Science, Services and Agents on the World Wide Web},
    Keywords = {Information retrieval},
    Number = {4},
    Pages = {418 - 433},
    Title = {Mapping queries to the {Linking Open Data} cloud: A case study using {DBpedia}},
    Url = {http://www.sciencedirect.com/science/article/pii/S1570826811000187},
    Volume = {9},
    Year = {2011},
    Bdsk-Url-1 = {http://www.sciencedirect.com/science/article/pii/S1570826811000187},
    Bdsk-Url-2 = {http://dx.doi.org/10.1016/j.websem.2011.04.001}}
Trade-off between diversity and precision

Result diversification based on query-specific cluster ranking

Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.

  • [PDF] [DOI] J. He, E. Meij, and M. de Rijke, “Result diversification based on query-specific cluster ranking,” J. am. soc. inf. sci., vol. 62, iss. 3, p. 550–571, 2011.
    [Bibtex]
    @article{JASIST:2011:he,
    Abstract = {Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.},
    Address = {New York, NY, USA},
    Author = {He, Jiyin and Meij, Edgar and de Rijke, Maarten},
    Citeulike-Article-Id = {9425102},
    Citeulike-Linkout-0 = {http://portal.acm.org/citation.cfm?id=1952338},
    Citeulike-Linkout-1 = {http://dx.doi.org/10.1002/asi.21468},
    Date-Added = {2011-10-20 10:40:50 +0200},
    Date-Modified = {2012-10-28 21:59:28 +0000},
    Doi = {10.1002/asi.21468},
    Issn = {1532-2882},
    Journal = {J. Am. Soc. Inf. Sci.},
    Keywords = {todo},
    Number = {3},
    Pages = {550--571},
    Posted-At = {2011-10-20 09:40:35},
    Priority = {2},
    Publisher = {Wiley Subscription Services, Inc., A Wiley Company},
    Title = {Result diversification based on query-specific cluster ranking},
    Url = {http://dx.doi.org/10.1002/asi.21468},
    Volume = {62},
    Year = {2011},
    Bdsk-Url-1 = {http://dx.doi.org/10.1002/asi.21468}}
Traditional Library Card Catalog

Conceptual language models for domain-specific retrieval

Over the years, various meta-languages have been used to manually enrich documents with conceptual knowledge of some kind. Examples include keyword assignment to citations or, more recently, tags to websites. In this paper we propose generative concept models as an extension to query modeling within the language modeling framework, which leverages these conceptual annotations to improve retrieval. By means of relevance feedback the original query is translated into a conceptual representation, which is subsequently used to update the query model.

Extensive experimental work on five test collections in two domains shows that our approach gives significant improvements in terms of recall, initial precision and mean average precision with respect to a baseline without relevance feedback. On one test collection, it is also able to outperform a text-based pseudo-relevance feedback approach based on relevance models. On the other test collections it performs similarly to relevance models. Overall, conceptual language models have the added advantage of offering query and browsing suggestions in the form of conceptual annotations. In addition, the internal structure of the meta-language can be exploited to add related terms.

Our contributions are threefold. First, an extensive study is conducted on how to effectively translate a textual query into a conceptual representation. Second, we propose a method for updating a textual query model using the concepts in conceptual representation. Finally, we provide an extensive analysis of when and how this conceptual feedback improves retrieval.

  • [PDF] [DOI] E. Meij, D. Trieschnigg, M. de Rijke, and W. Kraaij, “Conceptual language models for domain-specific retrieval,” Inf. process. manage., vol. 46, iss. 4, p. 448–469, 2010.
    [Bibtex]
    @article{IPM:2010:Meij,
    Address = {Tarrytown, NY, USA},
    Author = {Meij, Edgar and Trieschnigg, Dolf and de Rijke, Maarten and Kraaij, Wessel},
    Date-Added = {2011-10-12 18:31:55 +0200},
    Date-Modified = {2011-10-12 18:31:55 +0200},
    Doi = {http://dx.doi.org/10.1016/j.ipm.2009.09.005},
    Issn = {0306-4573},
    Journal = {Inf. Process. Manage.},
    Number = {4},
    Pages = {448--469},
    Publisher = {Pergamon Press, Inc.},
    Title = {Conceptual language models for domain-specific retrieval},
    Volume = {46},
    Year = {2010},
    Bdsk-Url-1 = {http://dx.doi.org/10.1016/j.ipm.2009.09.005}}