NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
This work addresses the inefficiency of Monte Carlo tree search in multi-agent cooperative settings, where exhaustive exploration of the exponentially large joint action space severely hampers performance. To overcome this challenge, the authors propose an efficient exploration method that bypasses explicit enumeration of joint actions by leveraging low-dimensional nonlinear representation learning and an interaction-guided proposal mechanism. They introduce an interaction scoring function that integrates individual agent reward predictions with a pairwise mixed-difference metric, and formulate candidate proposal selection as a bandit problem over local biases, for which they provide theoretical guarantees of a sublinear local regret bound. Empirical evaluations demonstrate that the proposed approach significantly outperforms strong baselines on MatGame, SMAC, and SMACv2 benchmarks, achieving superior sample efficiency and final performance under identical search budgets.
📝 Abstract
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent
Monte Carlo Tree Search
Joint Action Space
Exploration
Cooperative
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Monte Carlo Tree Search
Interaction-Guided Exploration
Joint-Action Space Reduction
Local Regret Guarantee
Coordination-Aware Bandit
🔎 Similar Papers
No similar papers found.