Microblogs have become an impor­tant source of infor­ma­tion for mar­ket­ing, intel­li­gence, and rep­u­ta­tion man­age­ment pur­poses. Streams of microblogs are of great value because of their direct and real-time nature. Deter­min­ing what an indi­vid­ual microblog post is about, how­ever, can be non-trivial because of cre­ative lan­guage usage, the highly con­tex­tu­al­ized and infor­mal nature of microblog posts, and the lim­ited length of this form of communication.

We pro­pose a solu­tion to the prob­lem of deter­min­ing what a microblog post is about through seman­tic link­ing: we add seman­tics to posts by auto­mat­i­cally iden­ti­fy­ing con­cepts that are seman­ti­cally related to it and gen­er­at­ing links to the cor­re­spond­ing Wikipedia arti­cles. The iden­ti­fied con­cepts can sub­se­quently be used for, e.g., social media min­ing, thereby reduc­ing the need for man­ual inspec­tion and selec­tion. Using a purpose-built test col­lec­tion of tweets, we show that recently pro­posed approaches for seman­ti­cally link­ing do not per­form well, mainly due to the idio­syn­cratic nature of microblog posts. We pro­pose a novel method based on machine learn­ing with a set of inno­v­a­tive fea­tures and show that is able to achieve sig­nif­i­cant improve­ments over all other meth­ods, espe­cially in terms of precision.

  • [PDF] E. Meij, W. Weerkamp, and M. de Rijke, “Adding Seman­tics to Microblog Posts,” in Pro­ceed­ings of the fifth ACM inter­na­tional con­fer­ence on Web search and data min­ing, New York, NY, USA, 2012.
    [Bib­tex]
    @inproceedings{WSDM:2012:meij,
      Address = {New York, NY, USA},
      Author = {Meij, Edgar and Weerkamp, Wouter and de Rijke, Maarten},
      Booktitle = {Proceedings of the fifth ACM international conference on Web search and data mining},
      Date-Added = {2011-10-26 11:21:51 +0200},
      Date-Modified = {2011-10-26 11:22:38 +0200},
      Publisher = {ACM},
      Series = {WSDM '12},
      Title = {Adding Semantics to Microblog Posts},
      Year = {2012},
      Bdsk-Url-1 = {http://doi.acm.org/10.1145/1935826.1935842}}