🤖 AI Summary
This work addresses the long-standing disconnection between image quality assessment (IQA) and exemplar-guided image processing. We propose DisQUE, a self-supervised disentangled representation learning framework that unifies both tasks within a shared content-appearance feature space for the first time. DisQUE employs a dual-stream network architecture coupled with self-supervised contrastive learning to achieve unsupervised disentanglement of content and appearance representations. It introduces an appearance-transfer-based quality prediction module and an exemplar-driven feature modulation mechanism to support appearance editing. On standard IQA benchmarks (e.g., LIVE, TID2013), DisQUE achieves state-of-the-art zero-shot cross-distortion quality prediction. In HDR tone mapping, it faithfully reproduces target appearances from only a few exemplars, demonstrating strong generalization. Our core contribution lies in establishing a unified disentangled representation that jointly supports both IQA and exemplar-guided processing—overcoming the limitations of conventional single-task modeling paradigms.
📝 Abstract
The deep learning revolution has strongly impacted low-level image processing tasks such as style/domain transfer, enhancement/restoration, and visual quality assessments. Despite often being treated separately, the aforementioned tasks share a common theme of understanding, editing, or enhancing the appearance of input images without modifying the underlying content. We leverage this observation to develop a novel disentangled representation learning method that decomposes inputs into content and appearance features. The model is trained in a self-supervised manner and we use the learned features to develop a new quality prediction model named DisQUE. We demonstrate through extensive evaluations that DisQUE achieves state-of-the-art accuracy across quality prediction tasks and distortion types. Moreover, we demonstrate that the same features may also be used for image processing tasks such as HDR tone mapping, where the desired output characteristics may be tuned using example input-output pairs.