Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

πŸ“… 2026-04-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

207K/year
πŸ€– AI Summary
This work addresses the limited strategic capabilities of large language models in bilateral price negotiation under incomplete information. The authors propose a reinforcement learning framework grounded in verifiable rewards, training a 30-billion-parameter buyer agent against constrained sellers over real-world product distributions, with economic surplus maximization and budget adherence serving directly as reward signals. The study uncovers a four-stage evolution in the agent’s negotiation strategy, progressing from basic bargaining to sophisticated persuasive tactics. The trained agent significantly outperforms state-of-the-art models more than ten times its size in surplus extraction and demonstrates robust generalization to unseen, highly adversarial sellers.

Technology Category

Application Category

πŸ“ Abstract
The recent advancement of Large Language Models (LLMs) has established their potential as autonomous interactive agents. However, they often struggle in strategic games of incomplete information, such as bilateral price negotiation. In this paper, we investigate if Reinforcement Learning from Verifiable Rewards (RLVR) can effectively teach LLMs to negotiate. Specifically, we explore the strategic behaviors that emerge during the learning process. We introduce a framework that trains a mid-sized buyer agent against a regulated LLM seller across a wide distribution of real-world products. By grounding reward signals directly in the maximization of economic surplus and strict adherence to private budget constraints, we reveal a novel four-phase strategic evolution. The agent progresses from naive bargaining to using aggressive starting prices, moves through a phase of deadlock, and ultimately develops sophisticated persuasive skills. Our results demonstrate that this verifiable training allows a 30B agent to significantly outperform frontier models over ten times its size in extracting surplus. Furthermore, the trained agent generalizes robustly to stronger counterparties unseen during training and remains effective even when facing hostile, adversarial seller personas.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Negotiation
Strategic Games
Incomplete Information
Autonomous Agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning from Verifiable Rewards
strategic negotiation
economic surplus maximization
budget-constrained LLM agents
four-phase strategic evolution
πŸ”Ž Similar Papers
2024-01-29Conference on Empirical Methods in Natural Language ProcessingCitations: 3