WWW 2020 logo

Novel Entity Discovery from Web Tables

When working with any sort of knowledge base (KB) one has to make sure it is as complete and also as up-to-date as possible. Both tasks are non-trivial as they require recall-oriented efforts to determine which entities and relationships are missing from the KB. As such they require a significant amount of labor. Tables on the Web on the other hand are abundant and have the distinct potential to assist with these tasks. In particular, we can leverage the content in such tables to discover new entities, properties, and relationships. Because web tables typically only contain raw textual content we first need to determine which cells refer to which known entities—a task we dub table-to-KB matching. This first task aims to infer table semantics by linking table cells and heading columns to elements of a KB. We propose a feature-based method and on two public test collections we demonstrate substantial improvements over the state-of-the-art in terms of precision whilst also improving recall. Then second task builds upon these linked entities and properties to not only identify novel ones in the same table but also to bootstrap their type and additional relationships. We refer to this process as novel entity discovery and, to the best of our knowledge, it is the first endeavor on mining the unlinked cells in web tables. Our method identifies not only out-of-KB (“novel”) information but also novel aliases for in-KB (“known”) entities. When evaluated using three purpose-built test collections, we find that our proposed approaches obtain a marked improvement in terms of precision over our baselines whilst keeping recall stable.

  • [PDF] [DOI] S. Zhang, E. Meij, K. Balog, and R. Reinanda, “Novel entity discovery from web tables,” in Proceedings of the web conference 2020, New York, NY, USA, 2020, p. 1298–1308.
    Address = {New York, NY, USA},
    Author = {Zhang, Shuo and Meij, Edgar and Balog, Krisztian and Reinanda, Ridho},
    Booktitle = {Proceedings of The Web Conference 2020},
    Date-Added = {2020-06-03 06:23:41 +0100},
    Date-Modified = {2020-06-03 06:24:53 +0100},
    Doi = {10.1145/3366423.3380205},
    Isbn = {9781450370233},
    Keywords = {tabular data extraction, Novel entity discovery, entity linking, KBP},
    Location = {Taipei, Taiwan},
    Numpages = {11},
    Pages = {1298--1308},
    Publisher = {Association for Computing Machinery},
    Series = {WWW '20},
    Title = {Novel Entity Discovery from Web Tables},
    Url = {https://doi.org/10.1145/3366423.3380205},
    Year = {2020},
    Bdsk-Url-1 = {https://doi.org/10.1145/3366423.3380205}}

Improving the Utility of Knowledge Graph Embeddings with Calibration

This paper addresses machine learning models that embed knowledge graph entities and relationships toward the goal of predicting unseen triples, which is an important task because most knowledge graphs are by nature incomplete. We posit that while offline link prediction accuracy using embeddings has been steadily improving on benchmark datasets, such embedding models have limited practical utility in real-world knowledge graph completion tasks because it is not clear when their predictions should be accepted or trusted. To this end, we propose to calibrate knowledge graph embedding models to output reliable confidence estimates for predicted triples. In crowdsourcing experiments, we demonstrate that calibrated confidence scores can make knowledge graph embeddings more useful to practitioners and data annotators in knowledge graph completion tasks. We also release two resources from our evaluation tasks: An enriched version of the FB15K benchmark and a new knowledge graph dataset extracted from Wikidata.

  • [PDF] T. Safavi, D. Koutra, and E. Meij, Improving the utility of knowledge graph embeddings with calibration, 2020.
    Archiveprefix = {arXiv},
    Author = {Tara Safavi and Danai Koutra and Edgar Meij},
    Date-Added = {2020-06-03 06:34:40 +0100},
    Date-Modified = {2020-06-03 06:47:20 +0100},
    Eprint = {2004.01168},
    Primaryclass = {cs.AI},
    Title = {Improving the Utility of Knowledge Graph Embeddings with Calibration},
    Year = {2020}}