🤖 AI Summary
This work proposes a continuous spectral-domain parameterized convolution method that overcomes the limitations of traditional convolutional neural networks, which are constrained by local receptive fields and struggle to capture global context, as well as vision Transformers, which lack spatial inductive biases and rely on fixed patch partitioning and positional encodings. By introducing direction-aware continuous spectral basis functions for the first time, the method defines smooth, shared convolutional kernels across the entire frequency domain, achieving both global receptive fields and resolution adaptability. The approach effectively combines structural priors with global modeling capacity, yielding strong robustness to geometric transformations, noise, and scale variations. It matches or surpasses the performance of existing convolutional, attention-based, and spectral methods while reducing the number of parameters by an order of magnitude across image classification, synthetic benchmarks, and 3D medical imaging tasks.
📝 Abstract
Convolutional Neural Networks (CNNs) rely on fixed-size kernels scanning local patches, which limits their ability to capture global context or long-range dependencies without very deep architectures. Vision Transformers (ViTs), in turn, provide global connectivity but lack spatial inductive bias, depend on explicit positional encodings, and remain tied to the initial patch size. Bridging these limitations requires a representation that is both structured and global. We introduce SONIC (Spectral Oriented Neural Invariant Convolutions), a continuous spectral parameterisation that models convolutional operators using a small set of shared, orientation-selective components. These components define smooth responses across the full frequency domain, yielding global receptive fields and filters that adapt naturally across resolutions. Across synthetic benchmarks, large-scale image classification, and 3D medical datasets, SONIC shows improved robustness to geometric transformations, noise, and resolution shifts, and matches or exceeds convolutional, attention-based, and prior spectral architectures with an order of magnitude fewer parameters. These results demonstrate that continuous, orientation-aware spectral parameterisations provide a principled and scalable alternative to conventional spatial and spectral operators.