Continuous Space-Time Video Super-Resolution with 3D Fourier Fields

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Traditional video super-resolution (VSR) methods suffer from explicit inter-frame motion alignment, decoupled spatial-temporal modeling, and sensitivity to motion estimation errors. To address these limitations, this paper proposes a novel continuous spatiotemporal VSR paradigm: modeling video as a 3D Video Fourier Field (VFF), enabling joint continuous implicit representation of space and time. We introduce the first 3D VFF formulation—eliminating explicit optical flow or deformation compensation—and support arbitrary spatiotemporal coordinate sampling and aliasing-free reconstruction. A neural encoder predicts differentiable Fourier basis coefficients, incorporating large-receptive-field architecture to capture long-range spatiotemporal dependencies. Additionally, an analytically derived Gaussian point-spread function is integrated to suppress spectral aliasing. Our method achieves state-of-the-art performance across multiple benchmarks, significantly improving spatial detail fidelity and temporal consistency, while enabling arbitrary scaling factors and efficient inference.

Technology Category

Application Category

📝 Abstract

We introduce a novel formulation for continuous space-time video super-resolution. Instead of decoupling the representation of a video sequence into separate spatial and temporal components and relying on brittle, explicit frame warping for motion compensation, we encode video as a continuous, spatio-temporally coherent 3D Video Fourier Field (VFF). That representation offers three key advantages: (1) it enables cheap, flexible sampling at arbitrary locations in space and time; (2) it is able to simultaneously capture fine spatial detail and smooth temporal dynamics; and (3) it offers the possibility to include an analytical, Gaussian point spread function in the sampling to ensure aliasing-free reconstruction at arbitrary scale. The coefficients of the proposed, Fourier-like sinusoidal basis are predicted with a neural encoder with a large spatio-temporal receptive field, conditioned on the low-resolution input video. Through extensive experiments, we show that our joint modeling substantially improves both spatial and temporal super-resolution and sets a new state of the art for multiple benchmarks: across a wide range of upscaling factors, it delivers sharper and temporally more consistent reconstructions than existing baselines, while being computationally more efficient. Project page: https://v3vsr.github.io.

Problem

Research questions and friction points this paper is trying to address.

Develops continuous spatiotemporal video super-resolution method

Eliminates explicit frame warping via 3D Fourier field representation

Enables aliasing-free reconstruction at arbitrary scales efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous 3D Video Fourier Field representation

Neural encoder predicts Fourier basis coefficients

Analytical Gaussian point spread function prevents aliasing

🔎 Similar Papers

No similar papers found.