MAPEX: A Multi-Agent Pipeline for Keyphrase Extraction

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing unsupervised keyword extraction methods predominantly rely on single-stage, uniform prompting strategies, which fail to adapt to varying document lengths and large language model (LLM) characteristics, thereby limiting inference and generation capabilities. To address this, we propose MAPEX—the first multi-agent collaborative framework for unsupervised keyword extraction. It comprises five core modules: expert recruitment, candidate generation, topic-guided refinement, knowledge enhancement, and post-processing. Crucially, MAPEX introduces a novel dynamic dual-path strategy that adaptively selects between knowledge-driven (for long documents) and topic-guided (for short documents) extraction modes based on input length. Leveraging prompt engineering, retrieval-augmented generation, topic modeling, and dynamic scheduling, MAPEX enables end-to-end keyword extraction. Evaluated on six benchmark datasets, MAPEX achieves an average F1@5 gain of 2.44% over the current state-of-the-art unsupervised methods and outperforms standard LLM baselines by 4.01%, demonstrating strong generalizability and cross-model compatibility.

Technology Category

Application Category

📝 Abstract

Keyphrase extraction is a fundamental task in natural language processing. However, existing unsupervised prompt-based methods for Large Language Models (LLMs) often rely on single-stage inference pipelines with uniform prompting, regardless of document length or LLM backbone. Such one-size-fits-all designs hinder the full exploitation of LLMs' reasoning and generation capabilities, especially given the complexity of keyphrase extraction across diverse scenarios. To address these challenges, we propose MAPEX, the first framework that introduces multi-agent collaboration into keyphrase extraction. MAPEX coordinates LLM-based agents through modules for expert recruitment, candidate extraction, topic guidance, knowledge augmentation, and post-processing. A dual-path strategy dynamically adapts to document length: knowledge-driven extraction for short texts and topic-guided extraction for long texts. Extensive experiments on six benchmark datasets across three different LLMs demonstrate its strong generalization and universality, outperforming the state-of-the-art unsupervised method by 2.44% and standard LLM baselines by 4.01% in F1@5 on average. Code is available at https://github.com/NKU-LITI/MAPEX.

Problem

Research questions and friction points this paper is trying to address.

Existing keyphrase extraction methods use uniform prompting regardless of document length

One-size-fits-all designs limit LLMs' reasoning capabilities for keyphrase extraction

Current approaches fail to adapt to diverse scenarios and LLM backbones

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent collaboration framework for keyphrase extraction

Dual-path strategy adapting to document length dynamically

Modular coordination with expert recruitment and topic guidance

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation