Combining Thesauri-based Methods for Biomedical Retrieval

This paper describes our participation in the TREC 2005 Genomics track. We took part in the ad hoc retrieval task and aimed at integrating thesauri in the retrieval model. We developed three thesauri-based methods, two of which made use of the existing MeSH thesaurus. One method uses blind relevance feedback on MeSH terms, the second uses an index of the MeSH thesaurus for query expansion. The third method makes use of a dynamically generated lookup list, by which acronyms and synonyms could be inferred. We show that, despite the relatively minor improvements in retrieval performance of individually applied methods, a combination works best and is able to deliver significant improvements over the baseline.

  • [PDF] E. Meij, L. H. L. IJzereef, L. A. Azzopardi, J. Kamps, M. de Rijke, M. Voorhees, and L. P. Buckland, “Combining thesauri-based methods for biomedical retrieval,” in The fourteenth text retrieval conference, 2006.
    [Bibtex]
    @inproceedings{TREC:2005:meij,
    Author = {Meij, E. and IJzereef, L.H.L. and Azzopardi, L.A. and Kamps, J. and de Rijke, M. and Voorhees, M. and Buckland, L.P.},
    Booktitle = {The Fourteenth Text REtrieval Conference},
    Date-Added = {2011-10-12 23:16:44 +0200},
    Date-Modified = {2012-10-30 09:23:12 +0000},
    Series = {TREC 2005},
    Title = {Combining Thesauri-based Methods for Biomedical Retrieval},
    Year = {2006}}

Van Case-Based Reasoning tot Information Retrieval; Case retrieval voor de helpdesk van een webhosting bedrijf

The helpdesk department of Hostnet, a web hosting company, daily receives 35 up to 50 questions from its customers. Within the domain in which Hostnet operates, only few off-the-shelf manuals exist and this is particularly noticeable on the helpdesk. Currently, only a few possibilities for knowledge management and/or elicitation exist within the organization. Questions are answered and problems are solved mostly by relying on the expertise of the staff. They therefore need to have up-to-date knowledge of a variety of possible questions, problem situations and solutions. They also need to be creative and flexible when handling novel questions.

Hostnet uses a ticketing system to handle questions from their customers. One of many advantages of using such a system is that all questions are stored, along with their corresponding answers. Hostnet uses the system for some time now and it has thus collected a large amount of domain and organization specific knowledge. This kind of information is exactly the type on which the research area of case-based reasoning focuses. Case-based reasoning uses previously solved problems (cases) as a knowledge source to aid solving similar cases in the future. One of the main components, in any case-based reasoning system, is the retrieval module. This module searches for alike cases, given a new case and a similarity measure. Techniques from the area of Information Retrieval may be used to assist in finding these alike questions, for example by implementing vector-space based, statistical methods.
This research focuses on analyzing to what extent previously solved cases can serve as a basis for a statistical information retrieval module of a case-based reasoning system within Hostnet by measuring the effects of different information retrieval techniques on the results. The evaluated techniques are stemming, term weighting and combinations thereof. The above described organizational setting is not unique to Hostnet. Every service-providing company with direct customer contacts is probably familiar with the described situation and could benefit from the presented results.

The suggested approach yields adequate results by which, at best, 60% of new questions can be answered, based on the first 10 retrieved stored questions. The mean reciprocal rank of the first matching question provided room for improvement however, with a value of 7 out of 10. The most important conclusion is that the best results are achieved when applying none of the before mentioned information retrieval techniques. The suggested approach needs to be improved for a successful integration within a case-based reasoning system, but it does seem viable.

  • [PDF] E. Meij, “Van case-based reasoning tot information retrieval; case retrieval voor de helpdesk van een webhosting bedrijf.,” Master Thesis, 2005.
    [Bibtex]
    @mastersthesis{2005:meij,
    Abstract = {The helpdesk department of Hostnet, a web hosting company, daily receives 35 up to 50 questions from its customers. Within the domain in which Hostnet operates, only few off-the-shelf manuals exist and this is particularly noticeable on the helpdesk. Currently, only a few possibilities for knowledge management and/or elicitation exist within the organization. Questions are answered and problems are solved mostly by relying on the expertise of the staff. They therefore need to have up-to-date knowledge of a variety of possible questions, problem situations and solutions. They also need to be creative and flexible when handling novel questions.
    Hostnet uses a ticketing system to handle questions from their customers. One of many advantages of using such a system is that all questions are stored, along with their corresponding answers. Hostnet uses the system for some time now and it has thus collected a large amount of domain and organization specific knowledge. This kind of information is exactly the type on which the research area of case-based reasoning focuses. Case-based reasoning uses previously solved problems (cases) as a knowledge source to aid solving similar cases in the future. One of the main components, in any case-based reasoning system, is the retrieval module. This module searches for alike cases, given a new case and a similarity measure. Techniques from the area of Information Retrieval may be used to assist in finding these alike questions, for example by implementing vector-space based, statistical methods.
    This research focuses on analyzing to what extent previously solved cases can serve as a basis for a statistical information retrieval module of a case-based reasoning system within Hostnet by measuring the effects of different information retrieval techniques on the results. The evaluated techniques are stemming, term weighting and combinations thereof. The above described organizational setting is not unique to Hostnet. Every service-providing company with direct customer contacts is probably familiar with the described situation and could benefit from the presented results.
    The suggested approach yields adequate results by which, at best, 60% of new questions can be answered, based on the first 10 retrieved stored questions. The mean reciprocal rank of the first matching question provided room for improvement however, with a value of 7 out of 10. The most important conclusion is that the best results are achieved when applying none of the before mentioned information retrieval techniques. The suggested approach needs to be improved for a successful integration within a case-based reasoning system, but it does seem viable.},
    Author = {Edgar Meij},
    Date-Added = {2011-10-12 21:53:59 +0200},
    Date-Modified = {2011-10-12 21:55:28 +0200},
    School = {University of Amsterdam},
    Title = {Van Case-Based Reasoning tot Information Retrieval; Case retrieval voor de helpdesk van een webhosting bedrijf.},
    Year = {2005}}