SHAMISA: SHAped Modeling of Implicit Structural Associations for Self-supervised No-Reference Image Quality Assessment

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scalability limitations of no-reference image quality assessment (NR-IQA), which typically relies on extensive human annotations. The authors propose a non-contrastive self-supervised framework that learns perceptual quality representations from unlabeled distorted images by constructing explicit structured relational supervision. Their approach integrates a composable distortion engine with implicit structural associations to model content-sensitive, distortion-aware soft relationships. To generate controllable distortions, they introduce a dual-source relational graph and a continuous parameter space. The model employs a convolutional encoder followed by a linear regressor, eliminating the need for contrastive losses or manual labels. Extensive experiments demonstrate state-of-the-art performance across synthetic, real-world, and cross-dataset NR-IQA benchmarks, significantly enhancing generalization and robustness.

Technology Category

Application Category

📝 Abstract
No-Reference Image Quality Assessment (NR-IQA) aims to estimate perceptual quality without access to a reference image of pristine quality. Learning an NR-IQA model faces a fundamental bottleneck: its need for a large number of costly human perceptual labels. We propose SHAMISA, a non-contrastive self-supervised framework that learns from unlabeled distorted images by leveraging explicitly structured relational supervision. Unlike prior methods that impose rigid, binary similarity constraints, SHAMISA introduces implicit structural associations, defined as soft, controllable relations that are both distortion-aware and content-sensitive, inferred from synthetic metadata and intrinsic feature structure. A key innovation is our compositional distortion engine, which generates an uncountable family of degradations from continuous parameter spaces, grouped so that only one distortion factor varies at a time. This enables fine-grained control over representational similarity during training: images with shared distortion patterns are pulled together in the embedding space, while severity variations produce structured, predictable shifts. We integrate these insights via dual-source relation graphs that encode both known degradation profiles and emergent structural affinities to guide the learning process throughout training. A convolutional encoder is trained under this supervision and then frozen for inference, with quality prediction performed by a linear regressor on its features. Extensive experiments on synthetic, authentic, and cross-dataset NR-IQA benchmarks demonstrate that SHAMISA achieves strong overall performance with improved cross-dataset generalization and robustness, all without human quality annotations or contrastive losses.
Problem

Research questions and friction points this paper is trying to address.

No-Reference Image Quality Assessment
self-supervised learning
perceptual quality
human labels
distortion modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
implicit structural associations
compositional distortion engine
no-reference image quality assessment
dual-source relation graphs
🔎 Similar Papers
No similar papers found.
M
Mahdi Naseri
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada
Zhou Wang
Zhou Wang
Professor, Electrical and Computer Engineering, University of Waterloo
image processingimage quality assessmentmultimediaquality of experiencecomputational vision