thesis cover image of a smart computer

Combining Concepts and Language Models for Information Access

Since the middle of last century, information retrieval has gained an increasing interest. Since its inception, much research has been devoted to finding optimal ways of representing both documents and queries, as well as improving ways of matching one with the other. In cases where document annotations or explicit semantics are available, matching algorithms can be informed using the concept languages in which such semantics are usually defined. These algorithms are able to match queries and documents based on textual and semantic evidence.

Recent advances have enabled the use of rich query representations in the form of query language models. This, in turn, allows us to account for the language associated with concepts within the retrieval model in a principled and transparent manner. Developments in the semantic web community, such as the Linked Open Data cloud, have enabled the association of texts with concepts on a large scale. Taken together, these developments facilitate a move beyond manually assigned concepts in domain-specific contexts into the general domain.

This thesis investigates how one can improve information access by employing the actual use of concepts as measured by the language that people use when they discuss them. The main contribution is a set of models and methods that enable users to retrieve and access information on a conceptual level. Through extensive evaluations, a systematic exploration and thorough analysis of the experimental results of the proposed models is performed. Our empirical results show that a combination of top-down conceptual information and bottom-up statistical information obtains optimal performance on a variety of tasks and test collections.

See http://phdthes.is/ for more information.

  • [PDF] E. Meij, “Combining concepts and language models for information access,” PhD Thesis, 2010.
    [Bibtex]
    @phdthesis{2010:meij,
    Author = {Meij, Edgar},
    Date-Added = {2011-10-20 10:18:00 +0200},
    Date-Modified = {2011-10-22 12:23:33 +0200},
    School = {University of Amsterdam},
    Title = {Combining Concepts and Language Models for Information Access},
    Year = {2010}}

 

Van Case-Based Reasoning tot Information Retrieval; Case retrieval voor de helpdesk van een webhosting bedrijf

The helpdesk department of Hostnet, a web hosting company, daily receives 35 up to 50 questions from its customers. Within the domain in which Hostnet operates, only few off-the-shelf manuals exist and this is particularly noticeable on the helpdesk. Currently, only a few possibilities for knowledge management and/or elicitation exist within the organization. Questions are answered and problems are solved mostly by relying on the expertise of the staff. They therefore need to have up-to-date knowledge of a variety of possible questions, problem situations and solutions. They also need to be creative and flexible when handling novel questions.

Hostnet uses a ticketing system to handle questions from their customers. One of many advantages of using such a system is that all questions are stored, along with their corresponding answers. Hostnet uses the system for some time now and it has thus collected a large amount of domain and organization specific knowledge. This kind of information is exactly the type on which the research area of case-based reasoning focuses. Case-based reasoning uses previously solved problems (cases) as a knowledge source to aid solving similar cases in the future. One of the main components, in any case-based reasoning system, is the retrieval module. This module searches for alike cases, given a new case and a similarity measure. Techniques from the area of Information Retrieval may be used to assist in finding these alike questions, for example by implementing vector-space based, statistical methods.
This research focuses on analyzing to what extent previously solved cases can serve as a basis for a statistical information retrieval module of a case-based reasoning system within Hostnet by measuring the effects of different information retrieval techniques on the results. The evaluated techniques are stemming, term weighting and combinations thereof. The above described organizational setting is not unique to Hostnet. Every service-providing company with direct customer contacts is probably familiar with the described situation and could benefit from the presented results.

The suggested approach yields adequate results by which, at best, 60% of new questions can be answered, based on the first 10 retrieved stored questions. The mean reciprocal rank of the first matching question provided room for improvement however, with a value of 7 out of 10. The most important conclusion is that the best results are achieved when applying none of the before mentioned information retrieval techniques. The suggested approach needs to be improved for a successful integration within a case-based reasoning system, but it does seem viable.

  • [PDF] E. Meij, “Van case-based reasoning tot information retrieval; case retrieval voor de helpdesk van een webhosting bedrijf.,” Master Thesis, 2005.
    [Bibtex]
    @mastersthesis{2005:meij,
    Abstract = {The helpdesk department of Hostnet, a web hosting company, daily receives 35 up to 50 questions from its customers. Within the domain in which Hostnet operates, only few off-the-shelf manuals exist and this is particularly noticeable on the helpdesk. Currently, only a few possibilities for knowledge management and/or elicitation exist within the organization. Questions are answered and problems are solved mostly by relying on the expertise of the staff. They therefore need to have up-to-date knowledge of a variety of possible questions, problem situations and solutions. They also need to be creative and flexible when handling novel questions.
    Hostnet uses a ticketing system to handle questions from their customers. One of many advantages of using such a system is that all questions are stored, along with their corresponding answers. Hostnet uses the system for some time now and it has thus collected a large amount of domain and organization specific knowledge. This kind of information is exactly the type on which the research area of case-based reasoning focuses. Case-based reasoning uses previously solved problems (cases) as a knowledge source to aid solving similar cases in the future. One of the main components, in any case-based reasoning system, is the retrieval module. This module searches for alike cases, given a new case and a similarity measure. Techniques from the area of Information Retrieval may be used to assist in finding these alike questions, for example by implementing vector-space based, statistical methods.
    This research focuses on analyzing to what extent previously solved cases can serve as a basis for a statistical information retrieval module of a case-based reasoning system within Hostnet by measuring the effects of different information retrieval techniques on the results. The evaluated techniques are stemming, term weighting and combinations thereof. The above described organizational setting is not unique to Hostnet. Every service-providing company with direct customer contacts is probably familiar with the described situation and could benefit from the presented results.
    The suggested approach yields adequate results by which, at best, 60% of new questions can be answered, based on the first 10 retrieved stored questions. The mean reciprocal rank of the first matching question provided room for improvement however, with a value of 7 out of 10. The most important conclusion is that the best results are achieved when applying none of the before mentioned information retrieval techniques. The suggested approach needs to be improved for a successful integration within a case-based reasoning system, but it does seem viable.},
    Author = {Edgar Meij},
    Date-Added = {2011-10-12 21:53:59 +0200},
    Date-Modified = {2011-10-12 21:55:28 +0200},
    School = {University of Amsterdam},
    Title = {Van Case-Based Reasoning tot Information Retrieval; Case retrieval voor de helpdesk van een webhosting bedrijf.},
    Year = {2005}}