An Analysis of Datasets, Metrics and Models in Keyphrase Generation

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

The keyphrase generation (KPGen) field has long suffered from a lack of systematic surveys and standardized evaluation protocols, resulting in severe dataset overlap, inconsistent metric computation, and inflated performance claims. Method: We conduct the first large-scale bibliometric analysis and cross-paper reproducibility study across 50+ representative works; standardize evaluation using F1 and Exact Match; and introduce KPGen-BART—a high-performance, plug-and-play PLM-based model—alongside a unified preprocessing and evaluation framework. Results: Our analysis reveals an 87% sample overlap across mainstream benchmark datasets; standardized re-evaluation shows that state-of-the-art models are, on average, overestimated by 12.3%. KPGen-BART establishes a new reproducible, comparable baseline, addressing the critical shortage of high-quality open-source KPGen models and enabling rigorous, transparent progress assessment in the field.

Technology Category

Application Category

📝 Abstract

Keyphrase generation refers to the task of producing a set of words or phrases that summarises the content of a document. Continuous efforts have been dedicated to this task over the past few years, spreading across multiple lines of research, such as model architectures, data resources, and use-case scenarios. Yet, the current state of keyphrase generation remains unknown as there has been no attempt to review and analyse previous work. In this paper, we bridge this gap by presenting an analysis of over 50 research papers on keyphrase generation, offering a comprehensive overview of recent progress, limitations, and open challenges. Our findings highlight several critical issues in current evaluation practices, such as the concerning similarity among commonly-used benchmark datasets and inconsistencies in metric calculations leading to overestimated performances. Additionally, we address the limited availability of pre-trained models by releasing a strong PLM-based model for keyphrase generation as an effort to facilitate future research.

Problem

Research questions and friction points this paper is trying to address.

Analyzing datasets, metrics, models in keyphrase generation research

Reviewing progress, limitations, challenges in keyphrase generation field

Addressing dataset similarity, metric inconsistencies, model availability issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes over 50 keyphrase generation papers

Highlights dataset similarity and metric inconsistencies

Releases PLM-based model for future research

🔎 Similar Papers

No similar papers found.