BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

📅 2025-01-19

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

In continuous spatiotemporal video super-resolution (C-STVSR), implicit neural representations (INRs) suffer from limited modeling capacity, while position encoding and pre-trained optical flow networks introduce rigidity and architectural constraints. To address these issues, we propose a flow-free, position-encoding-free end-to-end framework. Our method replaces explicit motion estimation with a B-spline mapper that enables smooth, differentiable temporal interpolation, and substitutes conventional positional encoding with Fourier feature mapping to precisely capture dominant spatial frequencies. This design jointly enforces spatiotemporal consistency and high-fidelity texture reconstruction. Evaluated on multiple benchmarks, our approach achieves state-of-the-art PSNR and SSIM scores, significantly improving motion naturalness and temporal coherence. Moreover, by eliminating hand-crafted components—namely optical flow networks and fixed position encodings—the framework exhibits enhanced flexibility and generalization across diverse motion patterns and video contents.

Technology Category

Application Category

📝 Abstract

Enhancing low-resolution, low-frame-rate videos to high-resolution, high-frame-rate quality is essential for a seamless user experience, motivating advancements in Continuous Spatial-Temporal Video Super Resolution (C-STVSR). While prior methods employ Implicit Neural Representation (INR) for continuous encoding, they often struggle to capture the complexity of video data, relying on simple coordinate concatenation and pre-trained optical flow network for motion representation. Interestingly, we find that adding position encoding, contrary to common observations, does not improve-and even degrade performance. This issue becomes particularly pronounced when combined with pre-trained optical flow networks, which can limit the model's flexibility. To address these issues, we propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video: 1) B-spline Mapper for smooth temporal interpolation, and 2) Fourier Mapper for capturing dominant spatial frequencies. Our approach achieves state-of-the-art PSNR and SSIM performance, showing enhanced spatial details and natural temporal consistency.

Problem

Research questions and friction points this paper is trying to address.

C-STVSR

INR

pre-trained optical flow network

Innovation

Methods, ideas, or system contributions that make the work stand out.

BF-STVSR

B-spline Mapper

Fourier Mapper

🔎 Similar Papers

No similar papers found.