• Publications
    • Conference Papers
    • Workshop Papers
    • Journal Papers
    • Publicity
    • Books
    • Theses
    • Submitted
  • Professional Activities
  • Teaching
  • About
  • Contact

Edgar Meij

semantic search research ッ

  • Publications
    • Conference Papers
    • Workshop Papers
    • Journal Papers
    • Publicity
    • Books
    • Theses
    • Submitted
  • Professional Activities
  • Teaching
  • About
  • Contact
Histogram indicating the number of documents vs the number of keyphrases

A Comparative Study of Features for Keyphrase Extraction

19/11/2009 Conference Papers Publications No Comments

Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic approaches have been developed. Typically, candidate phrases are extracted using features such as position or frequency in the document text. Many different features have been suggested, and have been used individually or in combination. However, it is not clear which of these features are most informative for this task.

We address this issue in the context of keyphrase extraction from scientific literature. We introduce a new corpus that consists of fulltext journal articles and is substantially larger than data sets used in previous work. In addition, the rich collection and document structure available at the publishing stage is explicitly annotated. We suggest new features based on this structure and compare them to existing features, analyzing how the different features capture different aspects the keyphrase extraction task.

  • [PDF] K. Hofmann, M. Tsagkias, E. Meij, and M. de Rijke, “The impact of document structure on keyphrase extraction,” in Proceedings of the 18th acm conference on information and knowledge management, 2009.
    [Bibtex]
    @inproceedings{CIKM:2009:hofmann,
    Author = {Hofmann, Katja and Tsagkias, Manos and Meij, Edgar and de Rijke, Maarten},
    Booktitle = {Proceedings of the 18th ACM conference on Information and knowledge management},
    Date-Added = {2011-10-12 18:31:55 +0200},
    Date-Modified = {2012-10-30 08:42:45 +0000},
    Series = {CIKM 2009},
    Title = {The impact of document structure on keyphrase extraction},
    Year = {2009},
    Bdsk-Url-1 = {http://doi.acm.org/10.1145/1645953.1646215}}

 

a-comparative-study-of-knowledge-managementautomatic-keyphrase-extraction-tweetcomparative-study-of-features-for-keyphrase-extraction-infeatures-of-twitter-text-mininghash-tag-mininghashtag-miningjurnal-tentang-thinkjurnal-tentang-twitterjurnal-textmining-twitterKeyphrase extractionkeyphrase-featuresMachine learningnlp-repositorypittsburgh-nlp-repository-linkSemantic linkingSemanticizingText miningtext-mining-twitter-hashtagthe-impact-of-document-structure-on-keyphrase-extractiontwitter-hashtags-mining

Learning Semantic Query Suggestions

A query model based on normalized log-likelihood

Leave a Reply Cancel reply

Time limit is exhausted. Please reload CAPTCHA.

Edgar Meij logo

Welcome!

This is the website of Edgar Meij. I lead several groups of researchers and engineers at Bloomberg working on knowledge graphs, question answering, information retrieval, machine learning, and more…

Search

Tweets by @edgarmeij

Tags

AIDA Artificial Intelligence CLEF DBpedia Document priors edgar-meij entity-linking-and-retrieval entity-linking-and-retrieval-tutorial entity-linking-tutorial Entity finding Entity linking Information retrieval Knowledge base population Knowledge Graph Language modeling Linking Open Data LOD logo-penerbit-buku-internasional Lucene Machine learning meij MeSH Microblogs penerbit-buku-internasional Query log analysis Query modeling Relevance modeling Semanticizing Semantic linking Semantic query analysis Semantic search Teaching Text mining TREC Blog TREC Enterprise TREC Genomics TREC KBA TREC Microblog TREC Relevance Feedback Tutorial Twitter Web services Wikipedia Workflows Workshop
Proudly powered by WordPress | Theme: Doo by ThemeVS.