🤖 AI Summary
This work addresses the computational inefficiency in continuous space-time video super-resolution caused by dense pixel queries. The authors propose an efficient framework based on 2D Gaussian splatting that supports arbitrary spatial and temporal upscaling factors. By modeling motion implicitly to drive the spatiotemporal evolution of Gaussian kernels, the method circumvents conventional grid-based sampling. Key innovations include a lightweight intermediate fitting strategy, an optical flow-guided motion module, a covariance-aware resampling alignment mechanism, and an adaptive offset windowing scheme, collectively enhancing both efficiency and robustness. The approach achieves state-of-the-art performance on Vid4, GoPro, and Adobe240 benchmarks, with near-constant inference time across standard scales and over 3× speedup at extreme upscaling factors (e.g., ×32).
📝 Abstract
Continuous Spatio-Temporal Video Super-Resolution (C-STVSR) aims to simultaneously enhance the spatial resolution and frame rate of videos by arbitrary scale factors, offering greater flexibility than fixed-scale methods that are constrained by predefined upsampling ratios. In recent years, methods based on Implicit Neural Representations (INR) have made significant progress in C-STVSR by learning continuous mappings from spatio-temporal coordinates to pixel values. However, these methods fundamentally rely on dense pixel-wise grid queries, causing computational cost to scale linearly with the number of interpolated frames and severely limiting inference efficiency. We propose GS-STVSR, an ultra-efficient C-STVSR framework based on 2D Gaussian Splatting (2D-GS) that drives the spatiotemporal evolution of Gaussian kernels through continuous motion modeling, bypassing dense grid queries entirely. We exploit the strong temporal stability of covariance parameters for lightweight intermediate fitting, design an optical flow-guided motion module to derive Gaussian position and color at arbitrary time steps, introduce a Covariance resampling alignment module to prevent covariance drift, and propose an adaptive offset window for large-scale motion. Extensive experiments on Vid4, GoPro, and Adobe240 show that GS-STVSR achieves state-of-the-art quality across all benchmarks. Moreover, its inference time remains nearly constant at conventional temporal scales (X2--X8) and delivers over X3 speedup at extreme scales X32, demonstrating strong practical applicability.