Don't lie to your friends: Learning what you know from collaborative self-play

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

AI agents lack metacognitive capabilities to autonomously decide when to rely on parametric knowledge, invoke external tools, abstain from answering, or calibrate confidence—limiting robustness. Supervised fine-tuning struggles with generalization due to its dependence on human-annotated capability demonstrations. Method: We propose the first collaborative self-play paradigm, wherein a multi-agent population jointly optimizes via intrinsic reward mechanisms to endogenously elicit individual metaknowledge—without requiring human annotations of capability boundaries. Our framework integrates heterogeneous tool interfacing, population-based reinforcement learning, and selective prediction modeling. Results: Experiments demonstrate substantial cross-scenario generalization improvements for individual agents in tool invocation accuracy, assessment of tool-output reliability, and proactive answer abstention: tool misuse decreases by 37%, and selective prediction AUC increases by 0.22.

Technology Category

Application Category

📝 Abstract

To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outputs, and when to abstain or hedge. Such capabilities are hard to teach through supervised fine-tuning because they require constructing examples that reflect the agent's specific capabilities. We therefore propose a radically new approach to teaching agents what they know: emph{collaborative self-play}. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction. We focus on small societies of agents that have access to heterogeneous tools (corpus-specific retrieval), and therefore must collaborate to maximize their success while minimizing their effort. Experiments show that group-level rewards for multi-agent communities can induce policies that emph{transfer} to improve tool use and selective prediction in settings where individual agents are deployed in isolation.

Problem

Research questions and friction points this paper is trying to address.

Teaching AI agents self-awareness of capabilities and limitations.

Developing collaborative self-play for multi-agent learning.

Enhancing tool use and selective prediction through group rewards.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative self-play for AI learning

Multi-agent collaboration with heterogeneous tools

Group-level rewards improve individual tool use

🔎 Similar Papers

Self-playing Adversarial Language Game Enhances LLM Reasoning

2024-04-16arXiv.orgCitations: 11

Microsoft

$119,800 -

United States, Washington, Redmond / United States, New York, New York

Authors to Follow