LUVE : Latent-Cascaded Ultra-High-Resolution Video Generation with Dual Frequency Experts

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Ultra-high-definition video generation faces significant challenges in motion modeling, semantic consistency, and fine-detail synthesis. This work proposes LUVE, a novel framework that introduces, for the first time, a three-stage cascaded architecture operating in the latent space: it first generates low-resolution motion dynamics, then performs latent upsampling, and finally refines high-resolution content through a dual-expert mechanism that jointly optimizes high- and low-frequency components. By integrating with video diffusion models, LUVE effectively enhances both semantic fidelity and textural detail in a unified manner. Experimental results demonstrate that LUVE substantially improves the visual realism of generated videos, and ablation studies confirm the contribution of each architectural component to the overall performance.

Technology Category

Application Category

📝 Abstract

Recent advances in video diffusion models have significantly improved visual quality, yet ultra-high-resolution (UHR) video generation remains a formidable challenge due to the compounded difficulties of motion modeling, semantic planning, and detail synthesis. To address these limitations, we propose \textbf{LUVE}, a \textbf{L}atent-cascaded \textbf{U}HR \textbf{V}ideo generation framework built upon dual frequency \textbf{E}xperts. LUVE employs a three-stage architecture comprising low-resolution motion generation for motion-consistent latent synthesis, video latent upsampling that performs resolution upsampling directly in the latent space to mitigate memory and computational overhead, and high-resolution content refinement that integrates low-frequency and high-frequency experts to jointly enhance semantic coherence and fine-grained detail generation. Extensive experiments demonstrate that our LUVE achieves superior photorealism and content fidelity in UHR video generation, and comprehensive ablation studies further validate the effectiveness of each component. The project is available at \href{https://unicornanrocinu.github.io/LUVE_web/}{https://github.io/LUVE/}.

Problem

Research questions and friction points this paper is trying to address.

ultra-high-resolution video generation

motion modeling

semantic planning

detail synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent-cascaded

Ultra-High-Resolution Video Generation

Dual Frequency Experts

Video Diffusion Models

Latent Space Upsampling

🔎 Similar Papers

Pyramidal Flow Matching for Efficient Video Generative Modeling

2024-10-08arXiv.orgCitations: 31

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion

2024-10-10arXiv.orgCitations: 0