Multi-Agent Evolve: LLM Self-Improve through Co-evolution

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing reinforcement learning approaches for enhancing large language models’ (LLMs) reasoning capabilities heavily rely on human-annotated data and verifiable reward signals, limiting generalization and scalability; while self-play methods reduce human supervision, they still require external execution environments (e.g., Python interpreters), hindering applicability to general-purpose tasks. Method: We propose Multi-Agent Evolve (MAE), a novel framework that instantiates a Proposer–Solver–Judge tripartite agent architecture within a single LLM. MAE enables closed-loop self-improvement via problem generation, autonomous solution synthesis, and joint evaluation—eliminating dependence on external environment feedback. Contribution/Results: MAE significantly enhances cross-task generalization in mathematical reasoning, logical deduction, and commonsense question answering. Evaluated on Qwen2.5-3B-Instruct, it achieves an average improvement of 4.54% across multiple benchmarks, demonstrating strong scalability and task-agnostic adaptability without external tooling or human annotation.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) has demonstrated significant potential in enhancing the reasoning capabilities of large language models (LLMs). However, the success of RL for LLMs heavily relies on human-curated datasets and verifiable rewards, which limit their scalability and generality. Recent Self-Play RL methods, inspired by the success of the paradigm in games and Go, aim to enhance LLM reasoning capabilities without human-annotated data. However, their methods primarily depend on a grounded environment for feedback (e.g., a Python interpreter or a game engine); extending them to general domains remains challenging. To address these challenges, we propose Multi-Agent Evolve (MAE), a framework that enables LLMs to self-evolve in solving diverse tasks, including mathematics, reasoning, and general knowledge Q&A. The core design of MAE is based on a triplet of interacting agents (Proposer, Solver, Judge) that are instantiated from a single LLM, and applies reinforcement learning to optimize their behaviors. The Proposer generates questions, the Solver attempts solutions, and the Judge evaluates both while co-evolving. Experiments on Qwen2.5-3B-Instruct demonstrate that MAE achieves an average improvement of 4.54% on multiple benchmarks. These results highlight MAE as a scalable, data-efficient method for enhancing the general reasoning abilities of LLMs with minimal reliance on human-curated supervision.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning without human-curated data

Extending self-play RL to general domains beyond games

Enabling LLMs to self-evolve across diverse reasoning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent co-evolution framework with triplet interaction

Self-play reinforcement learning without human supervision

Single LLM instantiated as Proposer, Solver, and Judge

🔎 Similar Papers

Coevolving with the Other You: Fine-Tuning LLM with Sequential Cooperative Multi-Agent Reinforcement Learning