🤖 AI Summary
Existing spectral token mixers lack a consistency constraint across frequency channels, hindering effective integration of global spectral information. To address this limitation, this work proposes CHASM, a method that introduces structured inductive bias by sharing learnable channel feature bases across all frequencies to enforce global consistency, while preserving frequency-specific positive gains to accommodate local spectral variations. The operator is applied separably along the spatial axis for computational efficiency. By integrating with the Fourier transform, CHASM enables highly effective spectral mixing and consistently outperforms existing spectral mixing baselines—using the same backbone architecture—across diverse tasks including accelerated MRI reconstruction, undersampled MRI segmentation, and natural image restoration.
📝 Abstract
Spectral token mixers based on Fourier transforms provide an efficient way to model global interactions in visual feature maps. Existing designs often either apply filter-wise spectral responses along fixed channel axes, or learn adaptive frequency-indexed channel mixing without explicitly aligning the channel directions used across frequencies. We propose CHASM, a Cross-frequency Harmonized Axis-Separable Mixer, as a structured middle ground. CHASM separates what should be shared from what should remain frequency-specific: all frequencies share a learned channel eigenbasis, while each frequency retains its own positive spectral gains. The shared basis makes channel directions comparable across the spectrum, whereas the positive gains preserve local spectral adaptivity. CHASM applies this structured operator separably along the height and width axes and is used as a drop-in replacement mixer inside existing backbones. We provide a structural characterization of the shared-basis operator family and evaluate CHASM through controlled same-backbone comparisons. Across accelerated MRI reconstruction, undersampled MRI segmentation, and natural-image reconstruction, CHASM consistently improves over same-backbone spectral-mixer baselines. Ablations show that removing the shared-basis constraint weakens performance, and randomizing coherent sampling geometry substantially reduces the gain, supporting cross-frequency harmonization as a useful inductive bias for spectral token operators.