🤖 AI Summary
To address the prohibitively high training costs of foundational video generation models, this paper proposes an efficient training paradigm under resource constraints: training a medium-scale 7B-parameter diffusion model from scratch using only 665,000 H100 GPU-hours. Methodologically, we introduce a lightweight spatiotemporal modeling architecture, a progressive curriculum learning strategy, and a low-overhead fine-tuning/resumption mechanism. Our core contribution is the empirical validation of the “medium model superiority” hypothesis—demonstrating that our 7B model matches or surpasses billion-parameter competitors on multiple video generation benchmarks (e.g., WebVid, ModelScope), while exhibiting strong cross-task generalization and rapid adaptation capability. This design significantly lowers deployment barriers and computational overhead for downstream applications, offering a scalable and practical alternative to parameter-inefficient large models.
📝 Abstract
This technical report presents a cost-efficient strategy for training a video generation foundation model. We present a mid-sized research model with approximately 7 billion parameters (7B) called Seaweed-7B trained from scratch using 665,000 H100 GPU hours. Despite being trained with moderate computational resources, Seaweed-7B demonstrates highly competitive performance compared to contemporary video generation models of much larger size. Design choices are especially crucial in a resource-constrained setting. This technical report highlights the key design decisions that enhance the performance of the medium-sized diffusion model. Empirically, we make two observations: (1) Seaweed-7B achieves performance comparable to, or even surpasses, larger models trained on substantially greater GPU resources, and (2) our model, which exhibits strong generalization ability, can be effectively adapted across a wide range of downstream applications either by lightweight fine-tuning or continue training. See the project page at https://seaweed.video/