🤖 AI Summary
Existing unsupervised keyphrase prediction methods rely on heuristic importance scoring, leading to substantial bias in informativeness estimation and low inference efficiency. To address these limitations, we propose a two-module framework integrating reference alignment and term-level modeling: (1) a novel reference-driven term-level informativeness model that jointly encodes the query, cited context, and title as unified reference signals—eliminating the need for explicit candidate phrase generation; and (2) a dynamic dual-mode architecture supporting both extraction and generation. Our approach leverages pretrained language models to instantiate a term-level evaluator and a phrase generator, augmented with a lightweight reference alignment mechanism. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks: our method achieves 89% of the Top-10 recall attained by supervised models, significantly improves query/document expansion in retrieval tasks, and delivers the fastest inference speed among models of comparable scale.
📝 Abstract
Unsupervised keyphrase prediction has gained growing interest in recent years. However, existing methods typically rely on heuristically defined importance scores, which may lead to inaccurate informativeness estimation. In addition, they lack consideration for time efficiency. To solve these problems, we propose ERU-KG, an unsupervised keyphrase generation (UKG) model that consists of an informativeness and a phraseness module. The former estimates the relevance of keyphrase candidates, while the latter generate those candidates. The informativeness module innovates by learning to model informativeness through references (e.g., queries, citation contexts, and titles) and at the term-level, thereby 1) capturing how the key concepts of documents are perceived in different contexts and 2) estimating informativeness of phrases more efficiently by aggregating term informativeness, removing the need for explicit modeling of the candidates. ERU-KG demonstrates its effectiveness on keyphrase generation benchmarks by outperforming unsupervised baselines and achieving on average 89% of the performance of a supervised model for top 10 predictions. Additionally, to highlight its practical utility, we evaluate the model on text retrieval tasks and show that keyphrases generated by ERU-KG are effective when employed as query and document expansions. Furthermore, inference speed tests reveal that ERU-KG is the fastest among baselines of similar model sizes. Finally, our proposed model can switch between keyphrase generation and extraction by adjusting hyperparameters, catering to diverse application requirements.