EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address challenges in video quality assessment (VQA)—including difficulty in spatiotemporal perception modeling, incompatibility of backbone enhancement strategies, and absence of adaptive restoration mechanisms—this paper proposes a free-energy-guided dual-branch eye-simulation framework. The framework decouples global aesthetic and local structural semantic modeling, incorporates a biologically inspired saccade prediction head to emulate dynamic visual attention, and designs a video self-restoration mechanism grounded in the principle of free-energy minimization. It achieves non-intrusive injection of high-order features via patch-wise and full-frame collaborative enhancement, coupled with saliency-guided dynamic feature fusion. Evaluated on five mainstream VQA benchmarks, the method achieves state-of-the-art or leading performance, significantly improving both prediction accuracy and interpretability. Results validate the effectiveness of neuro-perceptual mechanism modeling for VQA.

Technology Category

Application Category

📝 Abstract

Free-energy-guided self-repair mechanisms have shown promising results in image quality assessment (IQA), but remain under-explored in video quality assessment (VQA), where temporal dynamics and model constraints pose unique challenges. Unlike static images, video content exhibits richer spatiotemporal complexity, making perceptual restoration more difficult. Moreover, VQA systems often rely on pre-trained backbones, which limits the direct integration of enhancement modules without affecting model stability. To address these issues, we propose EyeSimVQA, a novel VQA framework that incorporates free-energy-based self-repair. It adopts a dual-branch architecture, with an aesthetic branch for global perceptual evaluation and a technical branch for fine-grained structural and semantic analysis. Each branch integrates specialized enhancement modules tailored to distinct visual inputs-resized full-frame images and patch-based fragments-to simulate adaptive repair behaviors. We also explore a principled strategy for incorporating high-level visual features without disrupting the original backbone. In addition, we design a biologically inspired prediction head that models sweeping gaze dynamics to better fuse global and local representations for quality prediction. Experiments on five public VQA benchmarks demonstrate that EyeSimVQA achieves competitive or superior performance compared to state-of-the-art methods, while offering improved interpretability through its biologically grounded design.

Problem

Research questions and friction points this paper is trying to address.

Extends free-energy self-repair from IQA to VQA challenges

Integrates enhancement modules without disrupting pre-trained backbones

Models gaze dynamics for improved quality prediction fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Free-energy-guided dual-branch VQA framework

Specialized enhancement modules for visual inputs

Biologically inspired gaze dynamics prediction head

🔎 Similar Papers

No similar papers found.