• Publications
    • Conference Papers
    • Workshop Papers
    • Journal Papers
    • Publicity
    • Books
    • Theses
    • Submitted
  • Professional Activities
  • Teaching
  • About
  • Contact

Edgar Meij

semantic search research ッ

  • Publications
    • Conference Papers
    • Workshop Papers
    • Journal Papers
    • Publicity
    • Books
    • Theses
    • Submitted
  • Professional Activities
  • Teaching
  • About
  • Contact
TREC

DutchHatTrick: Semantic query modeling, ConText, section detection, and match score maximization.

25/01/2012 Publications Unrefereed No Comments

This report discusses the collaborative work of the ErasmusMC, University of Twente, and the University of Amsterdam on the TREC 2011 Medical track. Here, the task is to retrieve patient visits from the University of Pittsburgh NLP Repository for 35 topics. The repository consists of 101,711 patient reports, and a patient visit was recorded in one or more reports.

Because the training set provided by the track organization was small and not made available until quite late in the competition, we decided to create a small training set ourselves. Not only did this allow us to test several ideas before submitting runs to TREC, it also led to several insights into the data. One finding was that synonyms are widely used. Query expansion was therefore deemed essential to achieve a reasonable performance. Query expansion has been used before in Information Retrieval (IR), and is often divided into statistical and knowledge-based query expansion. Statistical query expansion uses data derived from the corpus itself, and a well-known example is pseudo-relevance feedback . In contrast, we investigated knowledge-based query expansion, which uses a knowledge base such as an ontology or a dictionary to find related terms. This type of query expansion has not always proven to be successful. For instance, Hersh et al. found a decrease in overall search performance when using the Unified Medical Language System (UMLS) to find related terms. Liu et al. found slight improvements with scenario-specific expansion strategies using UMLS. In a previous TREC track, we also found reduced performance when using concept based query expansion , but found slightly improved results when using an approach combining concepts with a statistical model of related words . Similarly, Zhou found promising results when using combination of both the original words in the text and the synonyms found for concepts in the text.

An often-used resource for knowledge-based query expansion in the biomedical domain is the UMLS. However, initial explorations indicated that there is only limited overlap between terms used in topics and medical records and terms found in the UMLS. The main reason for this appears to be that the UMLS is mainly constructed from vocabularies used in classifying clinical data, but not intended to be used in text- mining. Terms in the UMLS tend to be more specific than what a physician would use in free-text reporting. For instance, a physician might use the term „upper endoscopy‟, but this term is not found in the UMLS. Instead, the term „upper GI endoscopy‟ is found. We have therefore explored a different source of synonyms: Wikipedia. We expected Wikipedia to have a better coverage of the terms encountered in medical records.

  • [PDF] M. Schuemie, D. Trieschnigg, and E. Meij, “DutchHatTrick: semantic query modeling, ConText, section detection, and match score maximization,” in The twentieth text retrieval conference, 2012.
    [Bibtex]
    @inproceedings{TREC:2011:schuemie,
    Author = {Schuemie, M. and Trieschnigg, Dolf and Meij, Edgar},
    Booktitle = {The Twentieth Text REtrieval Conference},
    Date-Added = {2011-10-22 12:14:30 +0200},
    Date-Modified = {2013-05-22 11:44:30 +0000},
    Month = {January},
    Series = {TREC 2011},
    Title = {{DutchHatTrick:} Semantic query modeling, {ConText}, section detection, and match score maximization},
    Year = {2012}}
dutchhattrickEntity linkingnlp-repository-at-pittsburghpittsburgh-university-trec-medQuery modelingrepository-semantic-slangSemantic linkingSemantic query analysissemantic-slangTREC Medicaltrec-2011-datatrec-2011-twitter-datatrec-2011tweets-2011trec-medical-topicstrec-twitter-2011-data-formattrec2011workingnotestrecmedicalrecordstweet-format-trectwitter-trecuniversity-medical-nlpuniversity-of-pittsburgh-nlp-repository-medical-reports-formatwhat-is-a-medical-trec

ECIR preprints published

Team COMMIT at TREC 2011

Leave a Reply Cancel reply

Time limit is exhausted. Please reload CAPTCHA.

Edgar Meij logo

Welcome!

This is the website of Edgar Meij. I lead several groups of researchers and engineers at Bloomberg working on knowledge graphs, question answering, information retrieval, machine learning, and more…

Search

Tweets by @edgarmeij

Tags

AIDA Artificial Intelligence CLEF DBpedia Document priors edgar-meij entity-linking-and-retrieval entity-linking-and-retrieval-tutorial entity-linking-tutorial Entity finding Entity linking Information retrieval Knowledge base population Knowledge Graph Language modeling Linking Open Data LOD logo-penerbit-buku-internasional Lucene Machine learning meij MeSH Microblogs penerbit-buku-internasional Query log analysis Query modeling Relevance modeling Semanticizing Semantic linking Semantic query analysis Semantic search Teaching Text mining TREC Blog TREC Enterprise TREC Genomics TREC KBA TREC Microblog TREC Relevance Feedback Tutorial Twitter Web services Wikipedia Workflows Workshop
Proudly powered by WordPress | Theme: Doo by ThemeVS.