MAPEX: A Multi-Agent Pipeline for Keyphrase Extraction

πŸ“… 2025-09-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing unsupervised keyword extraction methods predominantly rely on single-stage, uniform prompting strategies, which fail to adapt to varying document lengths and large language model (LLM) characteristics, thereby limiting inference and generation capabilities. To address this, we propose MAPEXβ€”the first multi-agent collaborative framework for unsupervised keyword extraction. It comprises five core modules: expert recruitment, candidate generation, topic-guided refinement, knowledge enhancement, and post-processing. Crucially, MAPEX introduces a novel dynamic dual-path strategy that adaptively selects between knowledge-driven (for long documents) and topic-guided (for short documents) extraction modes based on input length. Leveraging prompt engineering, retrieval-augmented generation, topic modeling, and dynamic scheduling, MAPEX enables end-to-end keyword extraction. Evaluated on six benchmark datasets, MAPEX achieves an average F1@5 gain of 2.44% over the current state-of-the-art unsupervised methods and outperforms standard LLM baselines by 4.01%, demonstrating strong generalizability and cross-model compatibility.

Technology Category

Application Category

πŸ“ Abstract
Keyphrase extraction is a fundamental task in natural language processing. However, existing unsupervised prompt-based methods for Large Language Models (LLMs) often rely on single-stage inference pipelines with uniform prompting, regardless of document length or LLM backbone. Such one-size-fits-all designs hinder the full exploitation of LLMs' reasoning and generation capabilities, especially given the complexity of keyphrase extraction across diverse scenarios. To address these challenges, we propose MAPEX, the first framework that introduces multi-agent collaboration into keyphrase extraction. MAPEX coordinates LLM-based agents through modules for expert recruitment, candidate extraction, topic guidance, knowledge augmentation, and post-processing. A dual-path strategy dynamically adapts to document length: knowledge-driven extraction for short texts and topic-guided extraction for long texts. Extensive experiments on six benchmark datasets across three different LLMs demonstrate its strong generalization and universality, outperforming the state-of-the-art unsupervised method by 2.44% and standard LLM baselines by 4.01% in F1@5 on average. Code is available at https://github.com/NKU-LITI/MAPEX.
Problem

Research questions and friction points this paper is trying to address.

Existing keyphrase extraction methods use uniform prompting regardless of document length
One-size-fits-all designs limit LLMs' reasoning capabilities for keyphrase extraction
Current approaches fail to adapt to diverse scenarios and LLM backbones
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent collaboration framework for keyphrase extraction
Dual-path strategy adapting to document length dynamically
Modular coordination with expert recruitment and topic guidance
πŸ”Ž Similar Papers
No similar papers found.
L
Liting Zhang
TMCC, College of Computer Science, Nankai University, Tianjin, China
Shiwan Zhao
Shiwan Zhao
Independent Researcher, Research Scientist of IBM Research - China (2000-2020)
AGILarge Language ModelNLPSpeechRecommeder System
Aobo Kong
Aobo Kong
Nankai University
NLPLLM
Q
Qicheng Li
TMCC, College of Computer Science, Nankai University, Tianjin, China