FDIM: A Feature-distance-based Generic Video Quality Metric for Versatile Codecs

πŸ“… 2026-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

223K/year
πŸ€– AI Summary
Existing video quality assessment methods struggle to effectively evaluate content-dependent and generative distortions introduced by neural codecs on both SDR and HDR content, and they often lack generalizability across codecs, content types, and dynamic ranges. To address this, this work proposes FDIMβ€”a universal video quality metric that integrates multi-scale deep features with handcrafted features to jointly model structural, textural, and semantic distortions, thereby significantly enhancing generalization. FDIM is trained on DCVQA, the first large-scale subjective dataset encompassing both traditional and neural codecs across SDR and HDR content, and is validated on ten datasets containing unseen codecs. Experimental results demonstrate that FDIM substantially outperforms existing methods and achieves high consistency with human perception. The code and validation sets are publicly released.

Technology Category

Application Category

πŸ“ Abstract
Video technology is advancing toward Ultra High Definition (UHD) and High Dynamic Range (HDR), which intensifies the need for higher compression efficiency for these high-specification videos. Beyond advances in traditional codecs, neural video codecs (NVCs) have attracted significant research attention and have evolved rapidly over the past few years. The coding artifacts of NVCs often exhibit content-varying and generative characteristics, which differ from those of conventional codecs and are challenging for traditional video quality assessment (VQA) methods to capture. Therefore, VQA metrics are required to generalize across different codecs, content types, and dynamic ranges to better support video codec research and evaluation. In this paper, we propose FDIM, a feature-distance-based generic video quality metric for both traditional and neural video codecs across SDR and HDR formats. FDIM employs a hybrid architecture that integrates deep and hand-crafted features. The deep feature component learns multi-scale representations to capture distortions ranging from structural and textural fidelity degradation to high-level semantic deviations, while the hand-crafted feature component provides stable complementary cues to improve overall generalization. We trained FDIM on a large-scale subjective quality assessment dataset (DCVQA) consisting of over 16k video sequences encoded by traditional block-based hybrid video codecs and end-to-end perceptually optimized neural video codecs. Extensive experiments on ten SDR/HDR VQA datasets containing diverse, previously unseen codecs demonstrate that FDIM achieves strong generalization and high correlation with subjective assessment. The source code for FDIM and the DCVQA validation set will be released at https://github.com/MCL-ZJU/FDIM.
Problem

Research questions and friction points this paper is trying to address.

neural video codecs
video quality assessment
generalization
HDR
coding artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-distance
Neural Video Codec
Hybrid Architecture
Generic VQA
HDR/SDR Generalization
J
Jiayi Wang
College of Information Science and Electronic Engineering, Zhejiang University; Zhejiang Key Laboratory of Multimodal Communication Networks and Intelligent Information Processing, Hangzhou 310058, China
L
Lichun Zhang
Central Media Technology Institute, Huawei, Hangzhou 310058, China
X
Xiaoqi Zhuang
College of Information Science and Electronic Engineering, Zhejiang University; Zhejiang Key Laboratory of Multimodal Communication Networks and Intelligent Information Processing, Hangzhou 310058, China
J
Jiaqi Zhang
College of Information Science and Electronic Engineering, Zhejiang University; Zhejiang Key Laboratory of Multimodal Communication Networks and Intelligent Information Processing, Hangzhou 310058, China
Lu Yu
Lu Yu
Professor of Institute of Information and Communication Engineering, Zhejiang University
Video CodingMultimedia Communication
Yin Zhao
Yin Zhao
Staff Engineer, Huawei Technologies
Video codingquality assessment