Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System

πŸ“… 2026-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of ambiguous credit assignment in multi-agent reinforcement learning, which often leads to unstable training and inaccurate evaluation of individual contributions. To this end, the authors propose SHARP, a novel framework that introduces Shapley values into multi-agent credit assignment for the first time. SHARP employs a hierarchical reward mechanism that integrates global rewards, marginal credit rewards, and tool-process rewards to enable precise contribution attribution and policy optimization. The approach further incorporates hierarchical advantage normalization, trajectory grouping analysis, and large language model–based tool invocation. Evaluated across multiple benchmark tasks, SHARP significantly outperforms both single-agent and state-of-the-art multi-agent methods, achieving average matching accuracy improvements of 23.66% and 14.05%, respectively.

Technology Category

Application Category

πŸ“ Abstract
Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remains notoriously difficult due to the credit assignment challenge, as it is often unclear which specific functional agent is responsible for the success or failure of decision trajectories. Existing methods typically rely on sparse or globally broadcast rewards, failing to capture individual contributions and leading to inefficient reinforcement learning. To address these limitations, we introduce the Shapley-based Hierarchical Attribution for Reinforcement Policy (SHARP), a novel framework for optimizing multi-agent reinforcement learning via precise credit attribution. SHARP effectively stabilizes training by normalizing agent-specific advantages across trajectory groups, primarily through a decomposed reward mechanism comprising a global broadcast-accuracy reward, a Shapley-based marginal-credit reward for each agent, and a tool-process reward to improve execution efficiency. Extensive experiments across various real-world benchmarks demonstrate that SHARP significantly outperforms recent state-of-the-art baselines, achieving average match improvements of 23.66% and 14.05% over single-agent and multi-agent approaches, respectively.
Problem

Research questions and friction points this paper is trying to address.

credit assignment
multi-agent reinforcement learning
reward allocation
Shapley value
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shapley value
credit assignment
multi-agent reinforcement learning
reward decomposition
large language models
πŸ”Ž Similar Papers
No similar papers found.
Yanming Li
Yanming Li
North Carolina State Universisty
X
Xuelin Zhang
Didichuxing Co. Ltd; Sun Yat-sen University
W
WenJie Lu
Didichuxing Co. Ltd
Z
Ziye Tang
Southeast University
M
Maodong Wu
Sun Yat-sen University
H
Haotian Luo
Sun Yat-sen University
T
Tongtong Wu
Monash University
Z
Zijie Peng
Sun Yat-sen University
H
Hongze Mi
Didichuxing Co. Ltd; Tianjin University
Y
Yibo Feng
The Chinese University of Hong Kong, Shenzhen
N
Naiqiang Tan
Didichuxing Co. Ltd
C
Chao Huang
Sun Yat-sen University
Hong Chen
Hong Chen
Department of Mathematics and Statistics, Huazhong Agricultural University
Learning TheoryMachine Learning
Li Shen
Li Shen
Associate Professor, Sun Yat-sen University
Machine LearningOptimization