World Models for Policy Refinement in StarCraft II

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of prospective decision-making in StarCraft II’s partially observable, high-dimensional environment, where existing large language model–based strategies lack explicit modeling of environmental dynamics. The authors propose StarWM, the first world model tailored for partial observability in StarCraft II, which employs a five-module structured textual representation to predict future observations. They also introduce SC2-Dynamics-50k, the first instruction-tuning dataset for dynamic prediction in StarCraft II. Integrated within a Generate–Simulate–Refine decision framework, StarWM improves resource prediction accuracy by nearly 60% in offline evaluation and increases win rates against built-in AI opponents by 30%, 15%, and 30% on Hard, Harder, and VeryHard difficulties, respectively, significantly enhancing macro-management stability and tactical risk assessment.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have recently shown strong reasoning and generalization capabilities, motivating their use as decision-making policies in complex environments. StarCraft II (SC2), with its massive state-action space and partial observability, is a challenging testbed. However, existing LLM-based SC2 agents primarily focus on improving the policy itself and overlook integrating a learnable, action-conditioned transition model into the decision loop. To bridge this gap, we propose StarWM, the first world model for SC2 that predicts future observations under partial observability. To facilitate learning SC2's hybrid dynamics, we introduce a structured textual representation that factorizes observations into five semantic modules, and construct SC2-Dynamics-50k, the first instruction-tuning dataset for SC2 dynamics prediction. We further develop a multi-dimensional offline evaluation framework for predicted structured observations. Offline results show StarWM's substantial gains over zero-shot baselines, including nearly 60% improvements in resource prediction accuracy and self-side macro-situation consistency. Finally, we propose StarWM-Agent, a world-model-augmented decision system that integrates StarWM into a Generate--Simulate--Refine decision loop for foresight-driven policy refinement. Online evaluation against SC2's built-in AI demonstrates consistent improvements, yielding win-rate gains of 30%, 15%, and 30% against Hard (LV5), Harder (LV6), and VeryHard (LV7), respectively, alongside improved macro-management stability and tactical risk assessment.
Problem

Research questions and friction points this paper is trying to address.

World Models
Policy Refinement
StarCraft II
Partial Observability
Action-Conditioned Transition Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

World Model
Partial Observability
Structured Text Representation
Instruction-Tuning Dataset
Policy Refinement
🔎 Similar Papers
No similar papers found.
Yixin Zhang
Yixin Zhang
Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
Domain AdaptationTransfer Learning
Ziyi Wang
Ziyi Wang
the Chinese University of Hong Kong; the University of Texas at Austin
graph learningEDA
Y
Yiming Rong
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
H
Haoxi Wang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
J
Jinling Jiang
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
S
Shuang Xu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences
H
Haoran Wu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences
Shiyu Zhou
Shiyu Zhou
Professor of Industrial Engineering
Industrial engineeringmanufacturingquality controlapplied statistics
B
Bo Xu
The Key Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences