Twitter aspects

Identifying Entity Aspects in Microblog Posts

Online reputation management is about monitoring and handling the public image of entities (such as companies) on the Web. An important task in this area is identifying aspects of the entity of interest (such as products, services, competitors, key people, etc.) given a stream of microblog posts referring to the entity. In this paper we compare different IR techniques and opinion target identification methods for automatically identifying aspects and find that (i) simple statistical method such as TF.IDF are a strong baseline for the task, being significantly better than applying opinion-oriented methods and (ii) only considering terms tagged as nouns improves the results for all the methods analyzed.

More information on the dataset that we created (and used in this paper) can be found here.

  • [PDF] D. Spina, E. Meij, M. de Rijke, A. Oghina, B. M. Thuong, and M. Breuss, “Identifying entity aspects in microblog posts,” in The 35th international acm sigir conference on research and development in information retrieval, 2012.
    Author = {Damiano Spina and Meij, Edgar and de Rijke, Maarten and Andrei Oghina and Bui Minh Thuong and Mathias Breuss},
    Booktitle = {The 35th International ACM SIGIR conference on research and development in Information Retrieval},
    Date-Added = {2012-05-03 22:17:17 +0200},
    Date-Modified = {2012-10-30 08:40:47 +0000},
    Series = {SIGIR 2012},
    Title = {Identifying Entity Aspects in Microblog Posts},
    Year = {2012}}
Twitter aspects

A Corpus for Entity Profiling in Microblog Posts

Microblogs have become an invaluable source of information for the purpose of online reputation management. An emerging problem in the field of online reputation management consists of identifying the key aspects of an entity commented in microblog posts. Streams of microblogs are of great value because of their direct and real-time nature and synthesizing them in form of entity profiles facilitates reputation managers to keep a track of the public image of the entity. Determining such aspects can be non-trivial because of creative language usage, the highly contextualized and informal nature of microblog posts, and the limited length of this form of communication.

In this paper we present two manually annotated corpora to evaluate the task of identifying aspects on Twitter, both of them based upon the WePS-3 ORM task dataset and made available online. The first is created using a pooling methodology, for which we have implemented various methods for automatically extracting aspects from tweets that are relevant for an entity. Human assessors have labeled each of the candidates as being relevant. The second corpus is more fine-grained and contains opinion targets. Here, annotators consider individual tweets related to an entity and manually identify whether the tweet is opinionated and, if so, which part of the tweet is subjective and what the target of the sentiment is, if any.

You can find more information on this test collection at

  • [PDF] D. Spina, E. Meij, A. Oghina, B. M. Thuong, M. Breuss, and M. de Rijke, “A corpus for entity profiling in microblog posts,” in Lrec 2012 workshop on language engineering for online reputation management, 2012.
    Author = {Damiano Spina and Edgar Meij and Andrei Oghina and Bui Minh Thuong and Mathias Breuss and Maarten de Rijke},
    Booktitle = {LREC 2012 Workshop on Language Engineering for Online Reputation Management},
    Date-Added = {2012-03-29 12:18:51 +0200},
    Date-Modified = {2012-03-29 12:20:09 +0200},
    Title = {A Corpus for Entity Profiling in Microblog Posts},
    Year = {2012}}