🤖 AI Summary
Graph-structured data lack global symmetry, exhibiting only local symmetries—posing a fundamental challenge for equivariant learning without a global coordinate system.
Method: We propose Torsor CNNs, the first framework unifying local symmetry modeling with group synchronization. It encodes local geometric relationships via edge-wise torsors—group-valued coordinate transformations between adjacent nodes—enabling strictly equivariant learning in the absence of global coordinates. We design Torsor convolution layers and a frustration loss function, theoretically unifying and generalizing classical CNNs, gauge-equivariant CNNs, and related architectures. Technically, the approach integrates group actions, geometric regularization, and equivariant neural networks, explicitly modeling group-valued edge transformations to enforce local coordinate consistency.
Results: Evaluated on multi-view 3D recognition, where camera poses naturally instantiate edge torsors, our method significantly enhances local equivariant representation learning. It establishes a novel paradigm for learning from locally symmetric structures.
📝 Abstract
Most equivariant neural networks rely on a single global symmetry, limiting their use in domains where symmetries are instead local. We introduce Torsor CNNs, a framework for learning on graphs with local symmetries encoded as edge potentials-- group-valued transformations between neighboring coordinate frames. We establish that this geometric construction is fundamentally equivalent to the classical group synchronization problem, yielding: (1) a Torsor Convolutional Layer that is provably equivariant to local changes in coordinate frames, and (2) the frustration loss--a standalone geometric regularizer that encourages locally equivariant representations when added to any NN's training objective. The Torsor CNN framework unifies and generalizes several architectures--including classical CNNs and Gauge CNNs on manifolds-- by operating on arbitrary graphs without requiring a global coordinate system or smooth manifold structure. We establish the mathematical foundations of this framework and demonstrate its applicability to multi-view 3D recognition, where relative camera poses naturally define the required edge potentials.