We’re now hiring next year’s interns!

I’m happy to announce that we have just opened up our applications for next year’s internships at Yahoo Labs in Barcelona. So, if you’re a PhD student in a related field, do consider applying. Especially if you’re interested in spending some time in sunny Barcelona and gaining research experience along the way, you’re more than welcome.

The application form can be found at http://comunicacio.barcelonamedia.org/yahoo/, the deadline is January 13th, 2014.

Do reach out if you have any questions!

Entity Linking and Retrieval Tutorial @ SIGIR 2013 – Slides, Code, and Bibliography

The material for our “Entity Linking and Retrieval” tutorial (with Krisztian Balog and Daan Odijk) for SIGIR 2013 has been updated and is available online on GitHub (slides), Dropbox (slides), Mendeley, and CodeAcademy. All material is summarized at the webpage for the tutorial: http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/. See my other blogpost for a brief summary.

Time-Aware Chi-squared for Document Filtering over Time

To appear at TAIA2013 (a SIGIR 2013 workshop).

Document filtering over time is widely applied in various tasks such as tracking topics in online news or social media. We consider it a classification task, where topics of interest correspond to classes, and the feature space consists of the words associated to each class. In “streaming” settings the set of words associated with a concept may change. In this paper we employ a multinomial Naive Bayes classifier and perform periodic feature selection to adapt to evolving topics. We propose two ways of employing Pearson’s χ2 test for feature selection and demonstrate its benefit on the TREC KBA 2012 data set. By incorporating a time-dependent function in our equations for χ2 we provide an elegant method for applying different weighting schemes. Experiments show improvements of our approach over a non-adaptive baseline.

Do support groups members disclose less to their partners? the dynamics of HIV disclosure in four African countries

To appear in BMC Public Health.

Background: Recent efforts to curtail the HIV epidemic in Africa have emphasized preventing sexual transmission to partners through antiretroviral therapy. A component of current strategies is disclosure to partners, thus understanding its motivations will help maximise results. This study examines the rates, dynamics and consequences of partner disclosure in Burkina Faso, Kenya, Malawi and Uganda, with special attention to the role of support groups and stigma in disclosure.

Methods: The study employs mixed methods, including a cross-sectional client survey of counseling and testing services, focus groups, and in-depth interviews with HIV-positive individuals in stable partnerships in Burkina Faso, Kenya, Malawi and Uganda, recruited at healthcare facilities offering HIV testing.

Results: Rates of disclosure to partners varied between countries (32.7% – 92.7%). The lowest rate was reported in Malawi. Reasons for disclosure included preventing the transmission of HIV, the need for care, and upholding the integrity of the relationship. Fear of stigma was an important reason for non-disclosure. Women reported experiencing more negative reactions when disclosing to partners. Disclosure was positively associated with living in urban areas, higher education levels, and being male, while being negatively associated with membership to support groups.

Conclusions: Understanding of reasons for disclosure and recognition of the role of support groups in the process can help improve current prevention efforts, that increasingly focus on treatment as prevention as a way to halt new infections. Support groups can help spread secondary prevention messages, by explaining to their members that antiretroviral treatment has benefits for HIV positive individuals and their partners. Home-based testing can further facilitate partner disclosure, as couples can test together and be counseled jointly.

Semantic TED

Multilingual Semantic Linking for Video Streams: Making “Ideas Worth Sharing” More Accessible

Semantic TEDThis paper describes our (winning!) submission to the Developers Challenge at WoLE2013, “Doing Good by Linking Entities.” We present a fully automatic system – called “Semantic TED” – which provides intelligent suggestions in the form of links to Wikipedia articles for video streams in multiple languages, based on the subtitles that accompany the visual content. The system is applied to online conference talks. In particular, we adapt a recently proposed semantic linking approach for streams of television broadcasts to facilitate generating contextual links while a TED talk is being viewed. TED is a highly popular global conference series covering many research domains; the publicly available talks have accumulated a total view count of over one billion at the time of writing. We exploit the multi-linguality of Wikipedia and the TED subtitles to provide contextual suggestions in the language of the user watching a video. In this way, a vast source of educational and intellectual content is disclosed to a broad audience that might otherwise experience difficulties interpreting it.

  • [PDF] D. Odijk, E. Meij, D. Graus, and T. Kenter, “Multilingual semantic linking for video streams: making "ideas worth sharing" more accessible,” in Proceedings of the 2nd international workshop on web of linked entities (wole 2013), 2013.
    [Bibtex]
    @inproceedings{WOLE:2013:Odijk,
    Author = {Odijk, Daan and Meij, Edgar and Graus, David and Kenter, Tom},
    Booktitle = {Proceedings of the 2nd International Workshop on Web of Linked Entities (WoLE 2013)},
    Date-Added = {2013-05-15 14:09:58 +0000},
    Date-Modified = {2013-05-15 14:11:37 +0000},
    Title = {Multilingual Semantic Linking for Video Streams: Making "Ideas Worth Sharing" More Accessible},
    Year = {2013}}
Example entity linking for tweets, to support tweets summarization

Personalized Time-Aware Tweets Summarization

To appear as full paper at SIGIR 2013.

In this paper we focus on selecting meaningful tweets given a user’s interests. Specifically, we consider the task of time-aware tweets summarization, based on a user’s history and collaborative social influences from “social circles.” Continue reading “Personalized Time-Aware Tweets Summarization” »