🤖 AI Summary
Traditional CNNs suffer from degraded geometric robustness due to translation-equivariance and -invariance breakdown induced by downsampling/upsampling operations. To address this, we propose the first learnable polyphase sampling (LPS) framework tailored for complex-valued neural networks, extending LPS to the complex domain. We further introduce a complex-to-real projection layer and integrate Gumbel-Softmax to enable differentiable optimization over discrete phase selections. Theoretically, our approach preserves exact translation equivariance and invariance. Extensive experiments on polarimetric SAR image classification, reconstruction, and semantic segmentation demonstrate substantial improvements in geometric consistency and generalization performance. These results validate both the theoretical soundness and practical efficacy of equivariant modeling in the complex domain.
📝 Abstract
Convolutional neural networks have shown remarkable performance in recent years on various computer vision problems. However, the traditional convolutional neural network architecture lacks a critical property: shift equivariance and invariance, broken by downsampling and upsampling operations. Although data augmentation techniques can help the model learn the latter property empirically, a consistent and systematic way to achieve this goal is by designing downsampling and upsampling layers that theoretically guarantee these properties by construction. Adaptive Polyphase Sampling (APS) introduced the cornerstone for shift invariance, later extended to shift equivariance with Learnable Polyphase up/downsampling (LPS) applied to real-valued neural networks. In this paper, we extend the work on LPS to complex-valued neural networks both from a theoretical perspective and with a novel building block of a projection layer from $mathbb{C}$ to $mathbb{R}$ before the Gumbel Softmax. We finally evaluate this extension on several computer vision problems, specifically for either the invariance property in classification tasks or the equivariance property in both reconstruction and semantic segmentation problems, using polarimetric Synthetic Aperture Radar images.