Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of ambiguous credit assignment in multi-agent reinforcement learning, which often leads to unstable training and inaccurate evaluation of individual contributions. To this end, the authors propose SHARP, a novel framework that introduces Shapley values into multi-agent credit assignment for the first time. SHARP employs a hierarchical reward mechanism that integrates global rewards, marginal credit rewards, and tool-process rewards to enable precise contribution attribution and policy optimization. The approach further incorporates hierarchical advantage normalization, trajectory grouping analysis, and large language model–based tool invocation. Evaluated across multiple benchmark tasks, SHARP significantly outperforms both single-agent and state-of-the-art multi-agent methods, achieving average matching accuracy improvements of 23.66% and 14.05%, respectively.

Technology Category

Application Category

📝 Abstract

Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these systems remains notoriously difficult due to the credit assignment challenge, as it is often unclear which specific functional agent is responsible for the success or failure of decision trajectories. Existing methods typically rely on sparse or globally broadcast rewards, failing to capture individual contributions and leading to inefficient reinforcement learning. To address these limitations, we introduce the Shapley-based Hierarchical Attribution for Reinforcement Policy (SHARP), a novel framework for optimizing multi-agent reinforcement learning via precise credit attribution. SHARP effectively stabilizes training by normalizing agent-specific advantages across trajectory groups, primarily through a decomposed reward mechanism comprising a global broadcast-accuracy reward, a Shapley-based marginal-credit reward for each agent, and a tool-process reward to improve execution efficiency. Extensive experiments across various real-world benchmarks demonstrate that SHARP significantly outperforms recent state-of-the-art baselines, achieving average match improvements of 23.66% and 14.05% over single-agent and multi-agent approaches, respectively.

Problem

Research questions and friction points this paper is trying to address.

credit assignment

multi-agent reinforcement learning

reward allocation

Shapley value

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shapley value

credit assignment

multi-agent reinforcement learning