Nano World Models: A Minimalist Implementation of Future Video Prediction

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing world models lack compact, reproducible, and easily scalable implementations, hindering systematic investigation of their core design elements. This work proposes a minimalist video prediction framework based on diffusion forcing that unifies support for diverse generation targets, model scales, action-conditioning mechanisms, latent observation spaces, and evaluation protocols. For the first time, it offers highly modular and standardized interfaces that decouple key design choices, substantially enhancing experimental controllability and reproducibility. We systematically analyze the impact of prediction parameterization, model scale, and action injection strategies on prediction quality and long-horizon rollouts across control environments, game simulators, and real-world robotic datasets. All code, configurations, and pretrained models are publicly released.
📝 Abstract
World models have become a central paradigm for learning predictive simulators that support generation, planning, and decision-making. Yet, despite rapid progress in industry-scale interactive video generation, the broader research community still lacks compact, reproducible, and easily extensible implementations for studying the design choices underlying modern world models. We introduce Nano World Models, a minimalist codebase for future video prediction centered around diffusion forcing. Nano World Models provides a unified interface for generative objectives, model scales, action-conditioning mechanisms, latent observation spaces, datasets, evaluation protocols, and long-horizon rollout procedures. This design enables controlled studies of world-modeling components that are often entangled across separate implementations. Through experiments across simple control environments, game simulation, and real-robot data, we examine how prediction parameterization, architecture scale, action injection, sampling budget, and domain complexity affect video prediction quality and autoregressive rollout behavior. By releasing code, configurations, evaluation scripts, and pretrained checkpoints, Nano World Models aims to provide a compact yet extensible experimental substrate for open, reproducible, and scientific world-model research.
Problem

Research questions and friction points this paper is trying to address.

world models
future video prediction
reproducible research
minimalist implementation
design choices
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion forcing
world models
video prediction
minimalist implementation
controlled ablation
🔎 Similar Papers
No similar papers found.