Trade-off between diversity and precision

Result diversification based on query-specific cluster ranking

Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.

  • [PDF] [DOI] J. He, E. Meij, and M. de Rijke, “Result diversification based on query-specific cluster ranking,” J. am. soc. inf. sci., vol. 62, iss. 3, pp. 550-571, 2011.
    [Bibtex]
    @article{JASIST:2011:he,
    Abstract = {Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.},
    Address = {New York, NY, USA},
    Author = {He, Jiyin and Meij, Edgar and de Rijke, Maarten},
    Citeulike-Article-Id = {9425102},
    Citeulike-Linkout-0 = {http://portal.acm.org/citation.cfm?id=1952338},
    Citeulike-Linkout-1 = {http://dx.doi.org/10.1002/asi.21468},
    Date-Added = {2011-10-20 10:40:50 +0200},
    Date-Modified = {2012-10-28 21:59:28 +0000},
    Doi = {10.1002/asi.21468},
    Issn = {1532-2882},
    Journal = {J. Am. Soc. Inf. Sci.},
    Keywords = {todo},
    Number = {3},
    Pages = {550--571},
    Posted-At = {2011-10-20 09:40:35},
    Priority = {2},
    Publisher = {Wiley Subscription Services, Inc., A Wiley Company},
    Title = {Result diversification based on query-specific cluster ranking},
    Url = {http://dx.doi.org/10.1002/asi.21468},
    Volume = {62},
    Year = {2011},
    Bdsk-Url-1 = {http://dx.doi.org/10.1002/asi.21468}}