VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

In high-resolution video frame interpolation, inaccurate motion modeling arises from low-resolution optical flow estimation and coarse-grained upsampling, leading to ghosting artifacts, blurred edges, and texture discontinuities in interpolated frames. To address this, we propose VTinker—a novel framework featuring two core innovations: (1) guided flow upsampling, which leverages input frames as spatial priors to recover fine-grained motion details at full resolution; and (2) a texture mapping mechanism that selects crisp texture patches from an intermediate proxy frame to enable pixel-accurate alignment and reconstruction, thereby avoiding misalignment. VTinker integrates bidirectional low-resolution flow estimation, guided upsampling, texture mapping, and deep reconstruction. Evaluated on multiple benchmarks, it achieves state-of-the-art performance—significantly improving PSNR and SSIM while enhancing visual quality and effectively mitigating motion-related distortions.

Technology Category

Application Category

📝 Abstract

Due to large pixel movement and high computational cost, estimating the motion of high-resolution frames is challenging. Thus, most flow-based Video Frame Interpolation (VFI) methods first predict bidirectional flows at low resolution and then use high-magnification upsampling (e.g., bilinear) to obtain the high-resolution ones. However, this kind of upsampling strategy may cause blur or mosaic at the flows' edges. Additionally, the motion of fine pixels at high resolution cannot be adequately captured in motion estimation at low resolution, which leads to the misalignment of task-oriented flows. With such inaccurate flows, input frames are warped and combined pixel-by-pixel, resulting in ghosting and discontinuities in the interpolated frame. In this study, we propose a novel VFI pipeline, VTinker, which consists of two core components: guided flow upsampling (GFU) and Texture Mapping. After motion estimation at low resolution, GFU introduces input frames as guidance to alleviate the blurring details in bilinear upsampling flows, which makes flows' edges clearer. Subsequently, to avoid pixel-level ghosting and discontinuities, Texture Mapping generates an initial interpolated frame, referred to as the intermediate proxy. The proxy serves as a cue for selecting clear texture blocks from the input frames, which are then mapped onto the proxy to facilitate producing the final interpolated frame via a reconstruction module. Extensive experiments demonstrate that VTinker achieves state-of-the-art performance in VFI. Codes are available at: https://github.com/Wucy0519/VTinker.

Problem

Research questions and friction points this paper is trying to address.

Addresses motion blur in high-resolution video frame interpolation

Reduces ghosting artifacts from inaccurate optical flow estimation

Improves texture alignment in interpolated video frames

Innovation

Methods, ideas, or system contributions that make the work stand out.

Guided flow upsampling uses input frames for clarity

Texture mapping creates intermediate proxy to avoid artifacts

Reconstruction module produces final interpolated frame from texture blocks

🔎 Similar Papers

No similar papers found.

ByteDance

圣地亚哥

Research Engineer/Scientist (all levels), Efficient Models

TikTok

San Jose, California

AI Research Scientist, Computer Vision - Facebook Video Intelligence