🤖 AI Summary
To address the prohibitively high computational and memory overhead—and the need for retraining—when generating ultra-high-resolution (2K/4K) videos on standard-resolution (e.g., 720p) platforms, this paper proposes SuperGen, a training-free tiled video generation framework. SuperGen tackles this challenge through three key innovations: (1) a training-free tiling mechanism enabling arbitrary-resolution output without model fine-tuning; (2) a region-aware caching strategy that reuses intermediate features across denoising steps to eliminate redundancy; and (3) cache-guided parallel optimization coupled with communication-minimized scheduling to maximize throughput and hardware utilization. Evaluated across multiple benchmarks, SuperGen achieves state-of-the-art efficiency: it reduces GPU memory consumption by up to 68% and significantly lowers computational complexity while preserving video quality. Notably, it is the first framework to enable efficient single-GPU 4K video generation without architectural or training modifications.
📝 Abstract
Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and prohibitively high computational and memory costs. To this end, we introduce SuperGen, an efficient tile-based framework for ultra-high-resolution video generation. SuperGen features a novel training-free algorithmic innovation with tiling to successfully support a wide range of resolutions without additional training efforts while significantly reducing both memory footprint and computational complexity. Moreover, SuperGen incorporates a tile-tailored, adaptive, region-aware caching strategy that accelerates video generation by exploiting redundancy across denoising steps and spatial regions. SuperGen also integrates cache-guided, communication-minimized tile parallelism for enhanced throughput and minimized latency. Evaluations demonstrate that SuperGen harvests the maximum performance gains while achieving high output quality across various benchmarks.