An Efficient Quality Metric for Video Frame Interpolation Based on Motion-Field Divergence

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing video frame interpolation (VFI) quality metrics—such as PSNR, SSIM, and LPIPS—fail to adequately capture the perceptual impact of interpolation artifacts and neglect temporal coherence; dedicated metrics like FloLPIPS incur prohibitive computational overhead, limiting practical applicability. To address this, we propose the first motion-field divergence-based quality assessment method, which explicitly models temporal inconsistency via a divergence-weighting mechanism within the PSNR framework, enabling efficient and perceptually consistent evaluation. Our metric is differentiable and can be seamlessly integrated as a loss function in neural VFI training. Evaluated on the BVI-VFI benchmark, it achieves a 0.09 higher Pearson linear correlation coefficient with human judgments than FloLPIPS, while accelerating inference by 2.5× and reducing memory consumption by 4×. Moreover, it demonstrates robust generalization across diverse content categories.

Technology Category

Application Category

📝 Abstract

Video frame interpolation is a fundamental tool for temporal video enhancement, but existing quality metrics struggle to evaluate the perceptual impact of interpolation artefacts effectively. Metrics like PSNR, SSIM and LPIPS ignore temporal coherence. State-of-the-art quality metrics tailored towards video frame interpolation, like FloLPIPS, have been developed but suffer from computational inefficiency that limits their practical application. We present $ ext{PSNR}_{ ext{DIV}}$, a novel full-reference quality metric that enhances PSNR through motion divergence weighting, a technique adapted from archival film restoration where it was developed to detect temporal inconsistencies. Our approach highlights singularities in motion fields which is then used to weight image errors. Evaluation on the BVI-VFI dataset (180 sequences across multiple frame rates, resolutions and interpolation methods) shows $ ext{PSNR}_{ ext{DIV}}$ achieves statistically significant improvements: +0.09 Pearson Linear Correlation Coefficient over FloLPIPS, while being 2.5$ imes$ faster and using 4$ imes$ less memory. Performance remains consistent across all content categories and are robust to the motion estimator used. The efficiency and accuracy of $ ext{PSNR}_{ ext{DIV}}$ enables fast quality evaluation and practical use as a loss function for training neural networks for video frame interpolation tasks. An implementation of our metric is available at www.github.com/conalld/psnr-div.

Problem

Research questions and friction points this paper is trying to address.

Existing metrics fail to evaluate perceptual interpolation artefacts effectively

Current video interpolation metrics are computationally inefficient for practical use

There is a need for accurate and efficient quality assessment for video frame interpolation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances PSNR with motion divergence weighting

Detects temporal inconsistencies via motion field singularities

Achieves higher accuracy and efficiency than FloLPIPS

🔎 Similar Papers

Generalizable Implicit Motion Modeling for Video Frame Interpolation