🤖 AI Summary
Image quality assessment (IQA) demands accurate perceptual modeling while balancing computational efficiency—yet existing architectures (e.g., CNNs, ViTs, Swin Transformers) struggle with either representational fidelity or inference cost. Method: This work pioneers the adaptation of Visual Mamba—a state-space model-based visual architecture—to IQA, systematically evaluating its efficacy across task-specific, general-purpose, and cross-domain transfer scenarios. We propose StylePrompt, a lightweight parameter-efficient tuning paradigm leveraging mean/variance statistics for low-overhead, high-fidelity cross-task adaptation, integrated with multi-scale feature fusion and perceptual alignment. Contribution/Results: Our approach achieves state-of-the-art performance on both synthetic and authentic IQA benchmarks—including cross-domain evaluations—outperforming Swin Transformer, ViT, and CNN baselines. It attains superior perceptual accuracy while significantly reducing computational cost, establishing a new paradigm for IQA that jointly optimizes performance and efficiency.
📝 Abstract
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment (IQA), aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba remains under-explored. Consequently, we propose QMamba by revisiting and adapting the Mamba model for three crucial IQA tasks, i.e., task-specific, universal, and transferable IQA, which reveals its clear advantages over existing foundational models, e.g., Swin Transformer, ViT, and CNNs, in terms of perception and computational cost. To improve the transferability of QMamba, we propose the StylePrompt tuning paradigm, where lightweight mean and variance prompts are injected to assist task-adaptive transfer learning of pre-trained QMamba for different downstream IQA tasks. Compared with existing prompt tuning strategies, our StylePrompt enables better perceptual transfer with lower computational cost. Extensive experiments on multiple synthetic, authentic IQA datasets, and cross IQA datasets demonstrate the effectiveness of our proposed QMamba. The code will be available at: https://github.com/bingo-G/QMamba.git