🤖 AI Summary
Standard convolutions lack geometric adaptability due to fixed kernel structures, while existing adaptive methods suffer from the absence of a unified theoretical foundation. To address this, we propose **Metric Convolution**, which models images as 2D manifolds and defines kernel sampling positions on the unit sphere via local and geodesic distances—enabling signal-dependent, geometrically regularized convolution. We establish, for the first time, a unified metric-theoretic framework for adaptive convolution; design a differentiable and interpretable explicit metric generation module grounded in differential geometry and compatible with end-to-end optimization. Our method requires only unit-sphere sampling on the manifold and differentiable operators, enabling plug-and-play feature-level replacement. With fewer parameters and stronger generalization, it achieves state-of-the-art or competitive performance on image denoising and classification tasks.
📝 Abstract
Standard convolutions are prevalent in image processing and deep learning, but their fixed kernel design limits adaptability. Several deformation strategies of the reference kernel grid have been proposed. Yet, they lack a unified theoretical framework. By returning to a metric perspective for images, now seen as two-dimensional manifolds equipped with notions of local and geodesic distances, either symmetric (Riemannian metrics) or not (Finsler metrics), we provide a unifying principle: the kernel positions are samples of unit balls of implicit metrics. With this new perspective, we also propose metric convolutions, a novel approach that samples unit balls from explicit signal-dependent metrics, providing interpretable operators with geometric regularisation. This framework, compatible with gradient-based optimisation, can directly replace existing convolutions applied to either input images or deep features of neural networks. Metric convolutions typically require fewer parameters and provide better generalisation. Our approach shows competitive performance in standard denoising and classification tasks.