🤖 AI Summary
This work addresses the scalability bottleneck in ultra-high-definition image restoration, where pixel-wise methods incur prohibitive computational costs. To overcome this challenge, the authors propose C²SSM, a novel semantic clustering–centric visual modeling paradigm. The model employs a neural parametric mixture to extract sparse semantic centroids, leverages a state space model (SSM) for dual-path sequential modeling of these cluster centers, and efficiently propagates global context to all pixels via similarity-based distribution. By circumventing the need for per-pixel serial computation, C²SSM achieves state-of-the-art performance across five ultra-high-definition restoration tasks while substantially reducing computational overhead, thereby striking an optimal balance between efficiency and fine-grained detail preservation.
📝 Abstract
Ultra-High-Definition (UHD) image restoration is trapped in a scalability crisis: existing models, bound to pixel-wise operations, demand unsustainable computation. While state space models (SSMs) like Mamba promise linear complexity, their pixel-serial scanning remains a fundamental bottleneck for the millions of pixels in UHD content. We ask: must we process every pixel to understand the image? This paper introduces C$^2$SSM, a visual state space model that breaks this taboo by shifting from pixel-serial to cluster-serial scanning. Our core discovery is that the rich feature distribution of a UHD image can be distilled into a sparse set of semantic centroids via a neural-parameterized mixture model. C$^2$SSM leverages this to reformulate global modeling into a novel dual-path process: it scans and reasons over a handful of cluster centers, then diffuses the global context back to all pixels through a principled similarity distribution, all while a lightweight modulator preserves fine details. This cluster-centric paradigm achieves a decisive leap in efficiency, slashing computational costs while establishing new state-of-the-art results across five UHD restoration tasks. More than a solution, C$^2$SSM charts a new course for efficient large-scale vision: scan clusters, not pixels.