SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the prohibitively high computational and memory overhead—and the need for retraining—when generating ultra-high-resolution (2K/4K) videos on standard-resolution (e.g., 720p) platforms, this paper proposes SuperGen, a training-free tiled video generation framework. SuperGen tackles this challenge through three key innovations: (1) a training-free tiling mechanism enabling arbitrary-resolution output without model fine-tuning; (2) a region-aware caching strategy that reuses intermediate features across denoising steps to eliminate redundancy; and (3) cache-guided parallel optimization coupled with communication-minimized scheduling to maximize throughput and hardware utilization. Evaluated across multiple benchmarks, SuperGen achieves state-of-the-art efficiency: it reduces GPU memory consumption by up to 68% and significantly lowers computational complexity while preserving video quality. Notably, it is the first framework to enable efficient single-GPU 4K video generation without architectural or training modifications.

Technology Category

Application Category

📝 Abstract

Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and prohibitively high computational and memory costs. To this end, we introduce SuperGen, an efficient tile-based framework for ultra-high-resolution video generation. SuperGen features a novel training-free algorithmic innovation with tiling to successfully support a wide range of resolutions without additional training efforts while significantly reducing both memory footprint and computational complexity. Moreover, SuperGen incorporates a tile-tailored, adaptive, region-aware caching strategy that accelerates video generation by exploiting redundancy across denoising steps and spatial regions. SuperGen also integrates cache-guided, communication-minimized tile parallelism for enhanced throughput and minimized latency. Evaluations demonstrate that SuperGen harvests the maximum performance gains while achieving high output quality across various benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Generating ultra-high-resolution videos efficiently

Reducing computational and memory costs

Supporting various resolutions without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free tiling for ultra-high-resolution video generation

Adaptive region-aware caching to accelerate generation

Communication-minimized tile parallelism for enhanced throughput

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling