• Publications
    • Conference Papers
    • Workshop Papers
    • Journal Papers
    • Publicity
    • Books
    • Theses
    • Submitted
  • Professional Activities
  • Teaching
  • About
  • Contact

Edgar Meij

semantic search research ッ

  • Publications
    • Conference Papers
    • Workshop Papers
    • Journal Papers
    • Publicity
    • Books
    • Theses
    • Submitted
  • Professional Activities
  • Teaching
  • About
  • Contact

Linking queries to entities

24/02/2014 Blog No Comments

I’m happy to announce we’re releasing a new test collection for entity linking for web queries (within user sessions) to Wikipedia. About half of the queries in this dataset are sampled from Yahoo search logs, the other half comes from the TREC Session track. Check out the L24 dataset on Yahoo Webscope, or drop me a line for more information. Below you’ll find an excerpt of the README text associated with it.

With this dataset you can train, test, and benchmark entity linking systems on the task of linking web search queries – within the context of a search session – to entities. Entities are a key enabling component for semantic search, as many information needs can be answered by returning a list of entities, their properties, and/or their relations. A first step in any such scenario is to determine which entities appear in a query – a process commonly referred to as named entity resolution, named entity disambiguation, or semantic linking.

This dataset allows researchers and other practitioners to evaluate their systems for linking web search engine queries to entities. The dataset contains manually identified links to entities in the form of Wikipedia articles and provides the means to train, test, and benchmark such systems using manually created, gold standard data. With releasing this dataset publicly, we aim to foster research into entity linking systems for web search queries. To this end, we also include sessions and queries from the TREC Session track (years 2010–2013). Moreover, since the linked entities are aligned with a specific part of each query (a “span”), this data can also be used to evaluate systems that identify spans in queries, i.e, that perform query segmentation for web search queries, in the context of search sessions.

The key properties of the dataset are as follows.

  • Queries are taken from Yahoo US Web Search and from the TREC Session track (2010-2013).
  • There are 2635 queries in 980 sessions, 7482 spans, and 5964 links to Wikipedia articles in this dataset.
  • The annotations include the part of the query (the “span”) that is linked to each Wikipedia article. This information can also be used for query segmentation experiments.
  • The annotators have identified the “main” entity/ies for each query, if available.
  • The annotators also labeled the queries, identifying whether they are non-English, navigational, quote-or-question, adult, or ambiguous and also if an out-of-Wikipedia entity is mentioned in the query, i.e., when an entity is mentioned in a query but no suitable Wikipedia article exists.
  • The file includes session information: each session consists of an anonymized id, initial query, as well as all the queries issued within the same session and their relative date/timestamp if available.
  • Sessions are demarcated using a 30 minute time-out.
Entity linkingentity-linking-bio-energyentity-linking-query-yahooentity-segmentation-in-entity-disambiguoushttpedgar-meij-prolinking-queries-entitieshttpedgrlinking-labs-to-panytopenia-for-querieslinking-queries-to-entitiesother-name-of-panytopeniaQuery log analysisSemantic linkingSemantic query analysisSemantic searchSemanticizingText miningWikipediayahoo-query-entity-linking

Entity Linking and Retrieval for Semantic Search (WSDM 2014)

WSDM 2014, a recap

Leave a Reply Cancel reply

Time limit is exhausted. Please reload CAPTCHA.

Edgar Meij logo
Welcome!

This is the website of Edgar Meij. I lead several groups of researchers and engineers at Bloomberg working on knowledge graphs, question answering, information retrieval, machine learning, and more…

Search
Tweets by @edgarmeij
Tags
AIDA Artificial Intelligence CLEF content DBpedia edgar-meij entity-linking-and-retrieval entity-linking-and-retrieval-tutorial entity-linking-tutorial Entity finding Entity linking Information retrieval Knowledge base population Knowledge Graph Language modeling Linking Open Data LOD logo-penerbit-buku-internasional Machine learning meij MeSH Microblogs penerbit-buku-internasional personalized-time-aware-tweets-summarization Query log analysis Query modeling Relevance modeling Semanticizing Semantic linking Semantic query analysis Semantic search Teaching Text mining TREC Blog TREC Genomics TREC KBA TREC Microblog TREC Relevance Feedback TREC Sessions Tutorial Twitter Web services Wikipedia Workflows Workshop
Proudly powered by WordPress | Theme: Doo by ThemeVS.