🤖 AI Summary
Existing video processing methods exhibit limited performance in slow-motion synthesis, super-resolution, denoising, and inpainting. This paper introduces ActINR, the first framework to uncover the intrinsic suitability of implicit neural representations (INRs) for inter-frame motion modeling—specifically leveraging INR bias parameters as learnable temporal priors. ActINR formulates the INR as a learnable dictionary and employs a time-conditioned MLP to dynamically generate temporally adaptive bias terms, enabling joint end-to-end optimization with shared weights. The framework unifies multiple video restoration tasks, including 10× slow-motion generation, coupled 4× super-resolution with 2× slow-motion reconstruction, and video denoising and inpainting. Evaluated on standard benchmarks, ActINR achieves an average PSNR gain exceeding 6 dB over prior methods. This work advances both the theoretical understanding of INRs in dynamic video modeling and their practical effectiveness, establishing a unified, parameter-efficient paradigm for spatiotemporal video representation learning.
📝 Abstract
We propose a new continuous video modeling framework based on implicit neural representations (INRs) called ActINR. At the core of our approach is the observation that INRs can be considered as a learnable dictionary, with the shapes of the basis functions governed by the weights of the INR, and their locations governed by the biases. Given compact non-linear activation functions, we hypothesize that an INR's biases are suitable to capture motion across images, and facilitate compact representations for video sequences. Using these observations, we design ActINR to share INR weights across frames of a video sequence, while using unique biases for each frame. We further model the biases as the output of a separate INR conditioned on time index to promote smoothness. By training the video INR and this bias INR together, we demonstrate unique capabilities, including $10 imes$ video slow motion, $4 imes$ spatial super resolution along with $2 imes$ slow motion, denoising, and video inpainting. ActINR performs remarkably well across numerous video processing tasks (often achieving more than 6dB improvement), setting a new standard for continuous modeling of videos.