🤖 AI Summary
This work addresses the limited generalization of Gaussian derivative networks to unseen spatial scales. To rigorously evaluate scale robustness, we construct scaled variants (4× up/down) of Fashion-MNIST and CIFAR-10, and conduct systematic benchmarking on the STIR dataset. We propose three key innovations: (1) scale-channel dropout for regularization; (2) spatial max-pooling to enhance localization of off-center objects; and (3) average pooling—replacing max-pooling—to enable cross-scale feature fusion. Our architecture employs discretized Gaussian kernels and central difference operators to ensure scale covariance or invariance, while activation maps and receptive field visualizations improve model interpretability. Experiments demonstrate that our method significantly outperforms mainstream deep networks on unseen-scale test sets. The discretized implementation achieves optimal performance, delivering strong scale generalization, precise object localization, and high model transparency.
📝 Abstract
This paper presents an in-depth analysis of the scale generalisation properties of the scale-covariant and scale-invariant Gaussian derivative networks, complemented with both conceptual and algorithmic extensions. For this purpose, Gaussian derivative networks are evaluated on new rescaled versions of the Fashion-MNIST and the CIFAR-10 datasets, with spatial scaling variations over a factor of 4 in the testing data, that are not present in the training data. Additionally, evaluations on the previously existing STIR datasets show that the Gaussian derivative networks achieve better scale generalisation than previously reported for these datasets for other types of deep networks. We first experimentally demonstrate that the Gaussian derivative networks have quite good scale generalisation properties on the new datasets, and that average pooling of feature responses over scales may sometimes also lead to better results than the previously used approach of max pooling over scales. Then, we demonstrate that using a spatial max pooling mechanism after the final layer enables localisation of non-centred objects in image domain, with maintained scale generalisation properties. We also show that regularisation during training, by applying dropout across the scale channels, referred to as scale-channel dropout, improves both the performance and the scale generalisation. In additional ablation studies, we demonstrate that discretisations of Gaussian derivative networks, based on the discrete analogue of the Gaussian kernel in combination with central difference operators, perform best or among the best, compared to a set of other discrete approximations of the Gaussian derivative kernels. Finally, by visualising the activation maps and the learned receptive fields, we demonstrate that the Gaussian derivative networks have very good explainability properties.