Unifying Temporal and Structural Credit Assignment in LLM-Based Multi-Agent Prompt Optimization

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently optimizing collaborative dynamics in multi-agent systems, which is hindered by discrete, non-differentiable computation graphs and sparse global supervision. The authors propose a novel credit assignment mechanism that jointly incorporates temporal and structural perspectives: it identifies critical interaction rounds via state-space bottlenecks and disentangles individual agent contributions through fixed-role policies. To enable optimization in discrete spaces, they introduce “surrogate gradients” generated by large language models and design a block coordinate descent algorithm that alternately refines role-specific prompts and aggregation protocols. This approach uniquely unifies temporal and structural credit assignment, enabling precise localization of local error sources and significantly enhancing both optimization efficiency and interpretability. Empirical results demonstrate reduced query complexity and improved system performance across multiple complex reasoning benchmarks, offering an effective pathway toward self-improving multi-agent systems.
📝 Abstract
While Multi-Agent Systems (MAS) empower Large Language Models to tackle complex reasoning tasks through collaborative interaction, optimizing their dynamics remains a formidable challenge due to the discrete, non-differentiable nature of the computation graph and the sparsity of global supervisory signals. Existing black-box optimizers struggle to attribute trajectory-level failure to specific local components, resulting in inefficient, high-variance exploration. We argue that tractable MAS optimization needs structural inductive biases to disentangle error signals. We propose temporal and structural credit assignment, which decomposes the objective along two axes: (i) temporal credit, using state-space bottlenecks to identify critical rounds, and (ii) structural credit, using stationary role policies to isolate agent contributions. Leveraging these decomposed signals, we introduce a discrete, verbalized block coordinate descent algorithm for iterative refinement. Rather than indiscriminate global updates, it alternates between optimizing role prompts and aggregation protocols, using LLM-generated "proxy gradients" to target only the identified weak links. Across diverse reasoning benchmarks, our approach substantially reduces query complexity while improving performance, providing a principled and interpretable path toward self-improving MAS.
Problem

Research questions and friction points this paper is trying to address.

credit assignment
multi-agent systems
prompt optimization
large language models
non-differentiable optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

credit assignment
multi-agent systems
prompt optimization
block coordinate descent
large language models