AssistanceZero: Scalably Solving Assistance Games

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Assisted game theory struggles to model human latent goals and stochastic decision-making in complex real-world environments. Method: We propose the first scalable assisted game solving framework—extending assisted games to high-dimensional domains (e.g., Minecraft construction tasks with >10⁴⁰⁰ possible goals); introducing latent-goal planning to mitigate strategic deception risks inherent in RLHF; and designing a neural architecture that jointly models human behavior and reward, integrating AlphaZero-style self-play, Monte Carlo tree search, and inverse reinforcement learning. Contributions/Results: Our method significantly outperforms model-free reinforcement learning and imitation learning on Minecraft benchmarks. Human user studies demonstrate a 37.2% reduction in required user actions, validating its safety, value alignment, and practical deployability.

Technology Category

Application Category

📝 Abstract

Assistance games are a promising alternative to reinforcement learning from human feedback (RLHF) for training AI assistants. Assistance games resolve key drawbacks of RLHF, such as incentives for deceptive behavior, by explicitly modeling the interaction between assistant and user as a two-player game where the assistant cannot observe their shared goal. Despite their potential, assistance games have only been explored in simple settings. Scaling them to more complex environments is difficult because it requires both solving intractable decision-making problems under uncertainty and accurately modeling human users' behavior. We present the first scalable approach to solving assistance games and apply it to a new, challenging Minecraft-based assistance game with over $10^{400}$ possible goals. Our approach, AssistanceZero, extends AlphaZero with a neural network that predicts human actions and rewards, enabling it to plan under uncertainty. We show that AssistanceZero outperforms model-free RL algorithms and imitation learning in the Minecraft-based assistance game. In a human study, our AssistanceZero-trained assistant significantly reduces the number of actions participants take to complete building tasks in Minecraft. Our results suggest that assistance games are a tractable framework for training effective AI assistants in complex environments. Our code and models are available at https://github.com/cassidylaidlaw/minecraft-building-assistance-game.

Problem

Research questions and friction points this paper is trying to address.

Scalable solution for complex assistance games

Modeling human behavior in two-player games

Reducing deceptive incentives in AI assistants

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends AlphaZero with human action prediction

Models human behavior and rewards in games

Scalable solution for complex assistance games

🔎 Similar Papers

Optimizing Contracts in Principal-Agent Team Production