SuperGen: An Efficient Ultra-high-resolution Video Generation System with Sketching and Tiling

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prohibitively high computational and memory overhead—and the need for retraining—when generating ultra-high-resolution (2K/4K) videos on standard-resolution (e.g., 720p) platforms, this paper proposes SuperGen, a training-free tiled video generation framework. SuperGen tackles this challenge through three key innovations: (1) a training-free tiling mechanism enabling arbitrary-resolution output without model fine-tuning; (2) a region-aware caching strategy that reuses intermediate features across denoising steps to eliminate redundancy; and (3) cache-guided parallel optimization coupled with communication-minimized scheduling to maximize throughput and hardware utilization. Evaluated across multiple benchmarks, SuperGen achieves state-of-the-art efficiency: it reduces GPU memory consumption by up to 68% and significantly lowers computational complexity while preserving video quality. Notably, it is the first framework to enable efficient single-GPU 4K video generation without architectural or training modifications.

Technology Category

Application Category

📝 Abstract
Diffusion models have recently achieved remarkable success in generative tasks (e.g., image and video generation), and the demand for high-quality content (e.g., 2K/4K videos) is rapidly increasing across various domains. However, generating ultra-high-resolution videos on existing standard-resolution (e.g., 720p) platforms remains challenging due to the excessive re-training requirements and prohibitively high computational and memory costs. To this end, we introduce SuperGen, an efficient tile-based framework for ultra-high-resolution video generation. SuperGen features a novel training-free algorithmic innovation with tiling to successfully support a wide range of resolutions without additional training efforts while significantly reducing both memory footprint and computational complexity. Moreover, SuperGen incorporates a tile-tailored, adaptive, region-aware caching strategy that accelerates video generation by exploiting redundancy across denoising steps and spatial regions. SuperGen also integrates cache-guided, communication-minimized tile parallelism for enhanced throughput and minimized latency. Evaluations demonstrate that SuperGen harvests the maximum performance gains while achieving high output quality across various benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Generating ultra-high-resolution videos efficiently
Reducing computational and memory costs
Supporting various resolutions without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free tiling for ultra-high-resolution video generation
Adaptive region-aware caching to accelerate generation
Communication-minimized tile parallelism for enhanced throughput
🔎 Similar Papers
No similar papers found.
F
Fanjiang Ye
Rice University
Z
Zepeng Zhao
Carnegie Mellon University
Yi Mu
Yi Mu
University of Illinois Urbana Champaign
J
Jucheng Shen
Rice University
R
Renjie Li
Texas A&M University
K
Kaijian Wang
Rice University
D
Desen Sun
University of Waterloo
Saurabh Agarwal
Saurabh Agarwal
Indian Institute of Technology Dhanbad
Photonics MOEMSBio-sensors
Myungjin Lee
Myungjin Lee
Cisco Systems
NetworkingSystems
T
Triston Cao
NVIDIA
Aditya Akella
Aditya Akella
Professor, Computer Science, UT Austin
Computer NetworksNetworkingComputer SystemsSystemsCommunications
A
Arvind Krishnamurthy
University of Washington
T. S. Eugene Ng
T. S. Eugene Ng
Rice University
Zhengzhong Tu
Zhengzhong Tu
Texas A&M University, Google Research, University of Texas at Austin
Agentic AITrustworthy AIEmbodied AI
Y
Yuke Wang
Rice University