LangMARL: Natural Language Multi-Agent Reinforcement Learning

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the challenge that large language model agents struggle to learn effectively in dynamic collaborative environments due to sparse global rewards, which fail to provide the causal credit signals necessary for local policy optimization. The study introduces, for the first time, a credit assignment mechanism from multi-agent reinforcement learning into the language space, coupled with a policy gradient evolution framework. By replaying interaction trajectories to infer task-relevant causal relationships, the method generates dense and interpretable feedback signals for individual agents. This approach substantially improves sample efficiency, convergence speed, and generalization under sparse reward conditions, demonstrating consistent effectiveness across a range of cooperative tasks.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) agents struggle to autonomously evolve coordination strategies in dynamic environments, largely because coarse global outcomes obscure the causal signals needed for local policy refinement. We identify this bottleneck as a multi-agent credit assignment problem, which has long been studied in classical multi-agent reinforcement learning (MARL) but remains underaddressed in LLM-based systems. Building on this observation, we propose LangMARL, a framework that brings credit assignment and policy gradient evolution from cooperative MARL into the language space. LangMARL introduces agent-level language credit assignment, pioneers gradient evolution in language space for policy improvement, and summarizes task-relevant causal relations from replayed trajectories to provide dense feedback and improve convergence under sparse rewards. Extensive experiments across diverse cooperative multi-agent tasks demonstrate improved sample efficiency, interpretability, and strong generalization.

Problem

Research questions and friction points this paper is trying to address.

multi-agent credit assignment

large language models

cooperative multi-agent reinforcement learning

sparse rewards

coordination strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

language-based credit assignment

gradient evolution in language space

multi-agent reinforcement learning