DGIQA: Depth-guided Feature Attention and Refinement for Generalizable Image Quality Assessment

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization of no-reference image quality assessment (NR-IQA) under unknown natural distortions—such as low illumination, haze, and lens flare—this paper proposes a depth-guided cross-modal attention and refinement framework. Methodologically, it introduces (1) Depth-CAR, the first depth-estimation-guided architecture for scene-aware representation learning; (2) the Transformer-CNN Bridge (TCB), enabling efficient fusion of global semantics and local details; and (3) a multi-scale feature refinement module with attention-weighted projection, balancing accuracy and computational efficiency. The proposed method achieves state-of-the-art performance on both synthetic and real-world distortion datasets. It demonstrates superior cross-dataset generalization capability compared to existing approaches, while maintaining high inference speed and training efficiency.

Technology Category

Application Category

📝 Abstract
A long-held challenge in no-reference image quality assessment (NR-IQA) learning from human subjective perception is the lack of objective generalization to unseen natural distortions. To address this, we integrate a novel Depth-Guided cross-attention and refinement (Depth-CAR) mechanism, which distills scene depth and spatial features into a structure-aware representation for improved NR-IQA. This brings in the knowledge of object saliency and relative contrast of the scene for more discriminative feature learning. Additionally, we introduce the idea of TCB (Transformer-CNN Bridge) to fuse high-level global contextual dependencies from a transformer backbone with local spatial features captured by a set of hierarchical CNN (convolutional neural network) layers. We implement TCB and Depth-CAR as multimodal attention-based projection functions to select the most informative features, which also improve training time and inference efficiency. Experimental results demonstrate that our proposed DGIQA model achieves state-of-the-art (SOTA) performance on both synthetic and authentic benchmark datasets. More importantly, DGIQA outperforms SOTA models on cross-dataset evaluations as well as in assessing natural image distortions such as low-light effects, hazy conditions, and lens flares.
Problem

Research questions and friction points this paper is trying to address.

Improving generalization in no-reference image quality assessment (NR-IQA)
Enhancing feature learning with depth-guided attention and refinement
Fusing global and local features for better distortion assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Depth-Guided cross-attention and refinement mechanism
Transformer-CNN Bridge for feature fusion
Multimodal attention-based projection functions
🔎 Similar Papers
No similar papers found.
V
Vaishnav Ramesh
RoboPI laboratory, Department of ECE, University of Florida
J
Junliang Liu
RoboPI laboratory, Department of ECE, University of Florida
H
Haining Wang
RoboPI laboratory, Department of ECE, University of Florida
Md Jahidul Islam
Md Jahidul Islam
Assistant Professor, University of Florida
RoboticsVisual PerceptionArtificial IntelligenceMarine RoboticsTelerobotics