Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Image quality assessment (IQA) demands accurate perceptual modeling while balancing computational efficiency—yet existing architectures (e.g., CNNs, ViTs, Swin Transformers) struggle with either representational fidelity or inference cost. Method: This work pioneers the adaptation of Visual Mamba—a state-space model-based visual architecture—to IQA, systematically evaluating its efficacy across task-specific, general-purpose, and cross-domain transfer scenarios. We propose StylePrompt, a lightweight parameter-efficient tuning paradigm leveraging mean/variance statistics for low-overhead, high-fidelity cross-task adaptation, integrated with multi-scale feature fusion and perceptual alignment. Contribution/Results: Our approach achieves state-of-the-art performance on both synthetic and authentic IQA benchmarks—including cross-domain evaluations—outperforming Swin Transformer, ViT, and CNN baselines. It attains superior perceptual accuracy while significantly reducing computational cost, establishing a new paradigm for IQA that jointly optimizes performance and efficiency.

Technology Category

Application Category

📝 Abstract
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment (IQA), aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba remains under-explored. Consequently, we propose QMamba by revisiting and adapting the Mamba model for three crucial IQA tasks, i.e., task-specific, universal, and transferable IQA, which reveals its clear advantages over existing foundational models, e.g., Swin Transformer, ViT, and CNNs, in terms of perception and computational cost. To improve the transferability of QMamba, we propose the StylePrompt tuning paradigm, where lightweight mean and variance prompts are injected to assist task-adaptive transfer learning of pre-trained QMamba for different downstream IQA tasks. Compared with existing prompt tuning strategies, our StylePrompt enables better perceptual transfer with lower computational cost. Extensive experiments on multiple synthetic, authentic IQA datasets, and cross IQA datasets demonstrate the effectiveness of our proposed QMamba. The code will be available at: https://github.com/bingo-G/QMamba.git
Problem

Research questions and friction points this paper is trying to address.

Exploring Vision Mamba for Image Quality Assessment (IQA)
Adapting Mamba for task-specific, universal, transferable IQA
Improving transferability with StylePrompt tuning paradigm
Innovation

Methods, ideas, or system contributions that make the work stand out.

First use of Mamba model in IQA tasks
StylePrompt tuning for better transferability
Outperforms Swin Transformer, ViT, and CNNs
🔎 Similar Papers
No similar papers found.