Mining, ranking and recommending entity aspects

Entity queries constitute a large fraction of web search queries and most of these queries are in the form of an entity mention plus some context terms that represent an intent in the context of that entity. We refer to these entity-oriented search intents as entity aspects. Recognizing entity aspects in a query can improve various search applications such as providing direct answers, diversifying search results, and recommending queries. In this paper we focus on the tasks of identifying, ranking, and recommending entity aspects, and propose an approach that mines, clusters, and ranks such aspects from query logs.  Continue reading “Mining, ranking and recommending entity aspects” »

Dynamic query modeling for related content finding

While watching television, people increasingly consume additional content related to what they are watching. We consider the task of finding video content related to a live television broadcast for which we leverage the textual stream of subtitles associated with the broadcast. We model this task as a Markov decision process and propose a method that uses reinforcement learning to directly optimize the retrieval effectiveness of queries generated from the stream of subtitles. Our dynamic query modeling approach significantly outperforms state-of-the-art baselines for stationary query modeling and for text-based retrieval in a television setting. In particular we find that carefully weighting terms and decaying these weights based on recency significantly improves effectiveness. Moreover, our method is highly efficient and can be used in a live television setting, i.e., in near real time.

  • [PDF] D. Odijk, E. Meij, I. Sijaranamual, and M. de Rijke, “Dynamic query modeling for related content finding,” in SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval, 2015.
    [Bibtex]
    @inproceedings{SIGIR:2015:Odijk,
    Author = {Odijk, Daan and Meij, Edgar and Sijaranamual, Isaac and de Rijke, Maarten},
    Booktitle = {{SIGIR 2015: 38th international ACM SIGIR conference on Research and development in information retrieval}},
    Date-Added = {2015-08-06 13:14:13 +0000},
    Date-Modified = {2015-08-06 13:39:24 +0000},
    Month = {August},
    Publisher = {ACM},
    Title = {Dynamic query modeling for related content finding},
    Year = {2015}}

Learning to Explain Entity Relationships in Knowledge Graphs

We study the problem of explaining relationships between pairs of knowledge graph entities with human-readable descriptions. Our method extracts and enriches sentences that refer to an entity pair from a corpus and ranks the sentences according to how well they describe the relationship between the entities. We model this task as a learning to rank problem for sentences and employ a rich set of features. When evaluated on a large set of manually annotated sentences, we find that our method significantly improves over state-of-the-art baseline models.

  • [PDF] N. Voskarides, E. Meij, M. Tsagkias, M. de Rijke, and W. Weerkamp, “Learning to explain entity relationships in knowledge graphs,” in Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (volume 1: long papers), 2015, p. 564–574.
    [Bibtex]
    @inproceedings{ACL:2015:Voskarides,
    Author = {Voskarides, Nikos and Meij, Edgar and Tsagkias, Manos and de Rijke, Maarten and Weerkamp, Wouter},
    Booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)},
    Date-Added = {2015-08-06 13:08:02 +0000},
    Date-Modified = {2015-08-06 13:08:14 +0000},
    Location = {Beijing, China},
    Pages = {564--574},
    Publisher = {Association for Computational Linguistics},
    Title = {Learning to Explain Entity Relationships in Knowledge Graphs},
    Url = {http://aclweb.org/anthology/P15-1055},
    Year = {2015},
    Bdsk-Url-1 = {http://aclweb.org/anthology/P15-1055}}

Fast and Space-Efficient Entity Linking in Queries

Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds. In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredients that make the algorithm fast and space-efficient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O(k^2) implementation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model. We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times—at least two orders of magnitude faster than existing systems.

  • [PDF] R. Blanco, G. Ottaviano, and E. Meij, “Fast and space-efficient entity linking in queries,” in Proceedings of the eighth acm international conference on web search and data mining, 2015.
    [Bibtex]
    @inproceedings{WSDM:2015:blanco,
    Author = {Blanco, Roi and Ottaviano, Giuseppe and Meij, Edgar},
    Booktitle = {Proceedings of the eighth ACM international conference on Web search and data mining},
    Date-Added = {2011-10-26 11:21:51 +0200},
    Date-Modified = {2015-01-20 20:29:19 +0000},
    Series = {WSDM 2015},
    Title = {Fast and Space-Efficient Entity Linking in Queries},
    Year = {2015},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/1935826.1935842}}
CIKM 2014

Time-Aware Rank Aggregation for Microblog Search

We tackle the problem of searching microblog posts and frame it as a rank aggregation problem where we merge result lists generated by separate rankers so as to produce a final ranking to be returned to the user. We propose a rank aggregation method, TimeRA, that is able to infer the rank scores of documents via latent factor modeling. It is time-aware and rewards posts that are published in or near a burst of posts that are ranked highly in many of the lists being aggregated. Our experimental results show that it significantly outperforms state-of-the-art rank aggregation and time-sensitive microblog search algorithms.

Example entity linking for tweets, to support tweets summarization

Personalized Time-Aware Tweets Summarization

To appear as full paper at SIGIR 2013.

In this paper we focus on selecting meaningful tweets given a user’s interests. Specifically, we consider the task of time-aware tweets summarization, based on a user’s history and collaborative social influences from “social circles.” Continue reading “Personalized Time-Aware Tweets Summarization” »

Generating Pseudo Test Collections for Learning to Rank Scientific Articles

Pseudo test collections are automatically generated to provide training material for learning to rank methods. We propose a method for generating pseudo test collections in the domain of digital libraries, where data is relatively sparse, but comes with rich annotations. Our intuition is that documents are annotated to make them better findable for certain information needs. We use these annotations and the associated documents as a source for pairs of queries and relevant documents. We investigate how learning to rank performance varies when we use different methods for sampling annotations, and show how our pseudo test collection ranks systems compared to editorial topics with editorial judgements. Our results demonstrate that it is possible to train a learning to rank algorithm on generated pseudo judgments. In some cases, performance is on par with learning on manually obtained ground truth.

  • [PDF] R. Berendsen, M. Tsagkias, M. de Rijke, and E. Meij, “Generating pseudo test collections for learning to rank scientific articles,” in Information access evaluation. multilinguality, multimodality, and visual analytics – third international conference of the clef initiative, clef 2012, 2012.
    [Bibtex]
    @inproceedings{CLEF:2012:berendsen,
    Author = {Berendsen, Richard and Tsagkias, Manos and de Rijke, Maarten and Meij, Edgar},
    Booktitle = {Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics - Third International Conference of the CLEF Initiative, CLEF 2012},
    Date-Added = {2012-07-03 13:44:06 +0200},
    Date-Modified = {2012-10-30 08:37:52 +0000},
    Title = {Generating Pseudo Test Collections for Learning to Rank Scientific Articles},
    Year = {2012}}
Twitter aspects

Identifying Entity Aspects in Microblog Posts

Online reputation management is about monitoring and handling the public image of entities (such as companies) on the Web. An important task in this area is identifying aspects of the entity of interest (such as products, services, competitors, key people, etc.) given a stream of microblog posts referring to the entity. In this paper we compare different IR techniques and opinion target identification methods for automatically identifying aspects and find that (i) simple statistical method such as TF.IDF are a strong baseline for the task, being significantly better than applying opinion-oriented methods and (ii) only considering terms tagged as nouns improves the results for all the methods analyzed.

More information on the dataset that we created (and used in this paper) can be found here.

  • [PDF] D. Spina, E. Meij, M. de Rijke, A. Oghina, B. M. Thuong, and M. Breuss, “Identifying entity aspects in microblog posts,” in The 35th international acm sigir conference on research and development in information retrieval, 2012.
    [Bibtex]
    @inproceedings{SIGIR:2012:spina,
    Author = {Damiano Spina and Meij, Edgar and de Rijke, Maarten and Andrei Oghina and Bui Minh Thuong and Mathias Breuss},
    Booktitle = {The 35th International ACM SIGIR conference on research and development in Information Retrieval},
    Date-Added = {2012-05-03 22:17:17 +0200},
    Date-Modified = {2012-10-30 08:40:47 +0000},
    Series = {SIGIR 2012},
    Title = {Identifying Entity Aspects in Microblog Posts},
    Year = {2012}}