EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in video quality assessment (VQA)—including difficulty in spatiotemporal perception modeling, incompatibility of backbone enhancement strategies, and absence of adaptive restoration mechanisms—this paper proposes a free-energy-guided dual-branch eye-simulation framework. The framework decouples global aesthetic and local structural semantic modeling, incorporates a biologically inspired saccade prediction head to emulate dynamic visual attention, and designs a video self-restoration mechanism grounded in the principle of free-energy minimization. It achieves non-intrusive injection of high-order features via patch-wise and full-frame collaborative enhancement, coupled with saliency-guided dynamic feature fusion. Evaluated on five mainstream VQA benchmarks, the method achieves state-of-the-art or leading performance, significantly improving both prediction accuracy and interpretability. Results validate the effectiveness of neuro-perceptual mechanism modeling for VQA.

Technology Category

Application Category

📝 Abstract
Free-energy-guided self-repair mechanisms have shown promising results in image quality assessment (IQA), but remain under-explored in video quality assessment (VQA), where temporal dynamics and model constraints pose unique challenges. Unlike static images, video content exhibits richer spatiotemporal complexity, making perceptual restoration more difficult. Moreover, VQA systems often rely on pre-trained backbones, which limits the direct integration of enhancement modules without affecting model stability. To address these issues, we propose EyeSimVQA, a novel VQA framework that incorporates free-energy-based self-repair. It adopts a dual-branch architecture, with an aesthetic branch for global perceptual evaluation and a technical branch for fine-grained structural and semantic analysis. Each branch integrates specialized enhancement modules tailored to distinct visual inputs-resized full-frame images and patch-based fragments-to simulate adaptive repair behaviors. We also explore a principled strategy for incorporating high-level visual features without disrupting the original backbone. In addition, we design a biologically inspired prediction head that models sweeping gaze dynamics to better fuse global and local representations for quality prediction. Experiments on five public VQA benchmarks demonstrate that EyeSimVQA achieves competitive or superior performance compared to state-of-the-art methods, while offering improved interpretability through its biologically grounded design.
Problem

Research questions and friction points this paper is trying to address.

Extends free-energy self-repair from IQA to VQA challenges
Integrates enhancement modules without disrupting pre-trained backbones
Models gaze dynamics for improved quality prediction fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Free-energy-guided dual-branch VQA framework
Specialized enhancement modules for visual inputs
Biologically inspired gaze dynamics prediction head
🔎 Similar Papers
No similar papers found.
Zhaoyang Wang
Zhaoyang Wang
University of North Carolina at Chapel Hill
NLPLLM AlignmentLLM Reasoning
W
Wen Lu
State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi’an, Shaanxi 710071, China
J
Jie Li
State Key Laboratory of Integrated Services Networks, School of Electronic Engineering, Xidian University, Xi’an, Shaanxi 710071, China
Lihuo He
Lihuo He
Professor, Xidian University
Image/Video Quality AssessmentVisual Perception
M
Maoguo Gong
Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xidian University, Xi’an 710071, China, and College of Mathematical Science, Inner Mongolia Normal University, Hohhot 010028, China
X
Xinbo Gao
School of Electronic Engineering, Xidian University, Xi’an 710071, China