🤖 AI Summary
This study challenges the presumed universal superiority of global token mixing mechanisms in MRI restoration tasks, particularly when strong physical constraints already provide informative global priors. To this end, we design minimalist local gated CNNs and their large-receptive-field variants, enabling a fair comparison against state-of-the-art global methods—based on self-attention or state space models—under a unified training and evaluation protocol. Experiments demonstrate that local models achieve competitive performance in accelerated reconstruction and super-resolution tasks, with global token mixing offering significant advantages only in spatially heteroscedastic noise denoising. This work is the first to reveal, under controlled confounding factors, that the efficacy of global modeling is highly dependent on the specific degradation mechanism and the strength of physical constraints, thereby questioning the assumption that global architectures are inherently superior.
📝 Abstract
Global token mixing, implemented via self-attention or state-space sequence models, has become a popular model design choice for MRI restoration. However, MRI restoration tasks differ substantially in how their degradations vary over image and k-space domains, and in the degree to which global coupling is already imposed by physics-driven data consistency terms. In this work, we ask the question whether global token mixing is actually beneficial in each individual task across three representative settings: accelerated MRI reconstruction with explicit data consistency, MRI super-resolution with k-space center cropping, and denoising of clinical carotid MRI data with spatially heteroscedastic noise. To reduce confounding factors, we establish a controlled testbed comparing a minimal local gated CNN and its large-field variant, benchmarking them directly against state-of-the-art global models under aligned training and evaluation protocols. For accelerated MRI reconstruction, the minimal unrolled gated-CNN baseline is already highly competitive compared to recent token-mixing approaches in public reconstruction benchmarks, suggesting limited additional benefits when the forward model and data-consistency steps provide strong global constraints. For super-resolution, where low-frequency k-space data are largely preserved by the controlled low-pass degradation, local gated models remain competitive, and a lightweight large-field variant yields only modest improvements. In contrast, for denoising with pronounced spatially heteroscedastic noise, token-mixing models achieve the strongest overall performance, consistent with the need to estimate spatially varying reliability. In conclusion, our results demonstrate that the utility of global token mixing in MRI restoration is task-dependent, and it should be tailored to the underlying imaging physics and degradation structure.