🤖 AI Summary
To address poor temporal consistency, computational intractability, and low deployment efficiency in autoregressive long-video generation, this paper proposes a scalable block-wise autoregressive world model. It partitions videos into fixed-length frame chunks and introduces a temporally monotonic denoising mechanism to enforce causal modeling and streaming generation. We pioneer three key innovations: (1) chunk-level monotonic noise scheduling, (2) chunk-wise prompt conditioning, and (3) constant-memory inference—collectively overcoming the bottleneck of long-range temporal modeling. The method integrates large-scale diffusion architecture, MagiAttention sparse attention, chunked denoising training, and a custom distributed inference stack. Our largest model contains 24 billion parameters and supports up to 4 million tokens of context. On text-conditioned image-to-video (I2V) generation, it achieves high-fidelity, temporally coherent real-time synthesis, with peak GPU memory consumption independent of video length.
📝 Abstract
We present MAGI-1, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, MAGI-1 enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. MAGI-1 facilitates controllable generation via chunk-wise prompting and supports real-time, memory-efficient deployment by maintaining constant peak inference cost, regardless of video length. The largest variant of MAGI-1 comprises 24 billion parameters and supports context lengths of up to 4 million tokens, demonstrating the scalability and robustness of our approach. The code and models are available at https://github.com/SandAI-org/MAGI-1 and https://github.com/SandAI-org/MagiAttention. The product can be accessed at https://sand.ai.