A Comparative Study of Features for Keyphrase Extraction

Keyphrases are short phrases that reflect the main topic of a document. Because manually annotating documents with keyphrases is a time-consuming process, several automatic approaches have been developed. Typically, candidate phrases are extracted using features such as position or frequency in the document text. Many different features have been suggested, and have been used individually or in combination. However, it is not clear which of these features are most informative for this task.

We address this issue in the context of keyphrase extraction from scientific literature. We introduce a new corpus that consists of fulltext journal articles and is substantially larger than data sets used in previous work. In addition, the rich collection and document structure available at the publishing stage is explicitly annotated. We suggest new features based on this structure and compare them to existing features, analyzing how the different features capture different aspects the keyphrase extraction task.

[bibtex key=CIKM:2009:hofmann]

a-comparative-study-of-knowledge-management automatic-keyphrase-extraction-tweet comparative-study-of-features-for-keyphrase-extraction-in features-of-twitter-text-mining hash-tag-mining hashtag-mining jurnal-tentang-think jurnal-tentang-twitter jurnal-textmining-twitter Keyphrase extraction keyphrase-features Machine learning nlp-repository pittsburgh-nlp-repository-link Semantic linking Semanticizing Text mining text-mining-twitter-hashtag the-impact-of-document-structure-on-keyphrase-extraction twitter-hashtags-mining

Edgar Meij

A Comparative Study of Features for Keyphrase Extraction

Learning Semantic Query Suggestions

A query model based on normalized log-likelihood

Leave a Reply Cancel reply