COMBAT: Conditional World Models for Behavioral Agent Training

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the challenge that existing world models struggle to generate dynamic opponents capable of intelligent responses to player behavior when trained solely on single-player data. The authors propose the first diffusion-based, real-time, action-controllable world model, leveraging a 1.2-billion-parameter diffusion Transformer architecture integrated with a deeply compressed autoencoder, causal distillation, and diffusion guidance. Without explicit policy supervision, the model implicitly learns and exhibits emergent interactive behaviors. It supports real-time inference and effective learning under partial observability, successfully generating responsive and strategic opponents in *Tekken 3*. To validate the emergence of intelligent agent behavior, the study also introduces a novel evaluation metric.

Technology Category

Application Category

📝 Abstract

Recent advances in video generation have spurred the development of world models capable of simulating 3D-consistent environments and interactions with static objects. However, a significant limitation remains in their ability to model dynamic, reactive agents that can intelligently influence and interact with the world. To address this gap, we introduce COMBAT, a real-time, action-controlled world model trained on the complex 1v1 fighting game Tekken 3. Our work demonstrates that diffusion models can successfully simulate a dynamic opponent that reacts to player actions, learning its behavior implicitly. Our approach utilizes a 1.2 billion parameter Diffusion Transformer, conditioned on latent representations from a deep compression autoencoder. We employ state-of-the-art techniques, including causal distillation and diffusion forcing, to achieve real-time inference. Crucially, we observe the emergence of sophisticated agent behavior by training the model solely on single-player inputs, without any explicit supervision for the opponent's policy. Unlike traditional imitation learning methods, which require complete action labels, COMBAT learns effectively from partially observed data to generate responsive behaviors for a controllable Player 1. We present an extensive study and introduce novel evaluation methods to benchmark this emergent agent behavior, establishing a strong foundation for training interactive agents within diffusion-based world models.

Problem

Research questions and friction points this paper is trying to address.

world models

dynamic agents

reactive behavior

interactive simulation

behavioral modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional world models

diffusion transformer

emergent agent behavior