Dense Retrieval Adaptation using Target Domain Description (ICTIR 2023)
In information retrieval (IR), domain adaptation is the process of
adapting a retrieval model to a new domain whose data distribution
is different from the source domain. Existing methods in this area
focus on unsupervised domain adaptation where they have access
to the target document collection or supervised (often few-shot)
domain adaptation where they additionally have access to (limited)
labeled data in the target domain. There also exists research on
improving zero-shot performance of retrieval models with no adaptation. This paper introduces a new category of domain adaptation
in IR that is as-yet unexplored. Here, similar to the zero-shot setting, we assume the retrieval model does not have access to the
target document collection. In contrast, it does have access to a
brief textual description that explains the target domain. We define
a taxonomy of domain attributes in retrieval tasks to understand
different properties of a source domain that can be adapted to a
target domain. We introduce a novel automatic data construction
pipeline that produces a synthetic document collection, query set,
and pseudo relevance labels, given a textual domain description.
Extensive experiments on five diverse target domains show that
adapting dense retrieval models using the constructed synthetic
data leads to effective retrieval performance on the target domain.