Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper investigates the interaction dynamics of multi-agent language models under objective conflict: when two LLM-based agents reciprocally regulate each other via in-context iterative gradient updates, the system converges to a biased equilibrium where neither agent achieves its objective. We propose the “prompt geometry” framework—a theoretical characterization of how objective deviation and prompt-space structure govern convergence bias—and derive conditions for asymmetric convergence. Furthermore, we design a first-order adversarial algorithm with provable unilateral success. Our methodology integrates in-context learning, differentiable prompt optimization, and rigorous theoretical analysis, validated empirically on both Transformer-based architectures and GPT-5 via linear regression tasks. Results confirm the theoretically predicted systematic convergence bias and, for the first time, enable quantitative control over stability, bias direction, and robustness in multi-agent LLM interactions.

Technology Category

Application Category

📝 Abstract

We develop a theoretical framework for agent-to-agent interactions in multi-agent scenarios. We consider the setup in which two language model based agents perform iterative gradient updates toward their respective objectives in-context, using the output of the other agent as input. We characterize the generation dynamics associated with the interaction when the agents have misaligned objectives, and show that this results in a biased equilibrium where neither agent reaches its target - with the residual errors predictable from the objective gap and the geometry induced by the prompt of each agent. We establish the conditions for asymmetric convergence and provide an algorithm that provably achieves an adversarial result, producing one-sided success. Experiments with trained transformer models as well as GPT$5$ for the task of in-context linear regression validate the theory. Our framework presents a setup to study, predict, and defend multi-agent systems; explicitly linking prompt design and interaction setup to stability, bias, and robustness.

Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence dynamics when AI agents have conflicting objectives

Predicting biased equilibrium from objective gaps and prompt geometry

Establishing conditions for asymmetric convergence in multi-agent systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Theoretical framework for multi-agent gradient interactions

Characterizes biased equilibrium from misaligned objectives

Algorithm achieves asymmetric convergence with one-sided success

🔎 Similar Papers

No similar papers found.