InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the limited representational capacity of lightweight models in compressed video quality assessment. We propose a novel knowledge distillation paradigm leveraging the InternVideo2 large vision-language model, marking the first application of video foundation models to compression distortion modeling. Our approach introduces a compression-quality-prior-guided distillation strategy, incorporating multi-backbone architecture comparison, compression-domain feature alignment loss, and a lightweight quality regression head. Evaluated on mainstream compressed video quality datasets, the distilled lightweight model substantially outperforms conventional methods and existing lightweight competitors—reducing parameter count by 98.7% and accelerating inference by 3.2×. Key contributions are: (1) pioneering the adaptation of video foundation models to compression distortion perception; (2) designing a distillation framework explicitly aligned with compression-domain priors; and (3) systematically validating the efficacy of multi-backbone architectures in quality-aware distillation.

Technology Category

Application Category

📝 Abstract

Video quality assessment tasks rely heavily on the rich features required for video understanding, such as semantic information, texture, and temporal motion. The existing video foundational model, InternVideo2, has demonstrated strong potential in video understanding tasks due to its large parameter size and large-scale multimodal data pertaining. Building on this, we explored the transferability of InternVideo2 to video quality assessment under compression scenarios. To design a lightweight model suitable for this task, we proposed a distillation method to equip the smaller model with rich compression quality priors. Additionally, we examined the performance of different backbones during the distillation process. The results showed that, compared to other methods, our lightweight model distilled from InternVideo2 achieved excellent performance in compression video quality assessment.

Problem

Research questions and friction points this paper is trying to address.

Enhance compressed video quality assessment

Transfer large foundation model capabilities

Develop lightweight model via distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distillation of large model

Lightweight compression quality assessment

Transferability of InternVideo2

🔎 Similar Papers

VideoPrism: A Foundational Visual Encoder for Video Understanding