🤖 AI Summary
This work addresses the problem of learning identifiable equivariant embeddings from unlabeled observations of group actions, without assuming prior knowledge of the group structure or group-specific inductive biases. The proposed method introduces an end-to-end contrastive learning framework that jointly optimizes an encoder and a group representation, modeling group actions via invertible linear maps and enforcing equivariance constraints directly in the latent space. Theoretically, the learned representations are proven to be identifiable up to linear equivalence under mild conditions, and the approach generalizes to non-Abelian groups, product groups, and continuous groups—including O(n) and GL(n). Empirically, experiments on dSprites and synthetic benchmarks demonstrate high-fidelity recovery of complex group actions such as discrete rotations and periodic translations. To our knowledge, this is the first method achieving universal equivariant representation learning solely from observed group actions, without architectural constraints tied to specific symmetry priors.
📝 Abstract
We propose Equivariance by Contrast (EbC) to learn equivariant embeddings from observation pairs $(mathbf{y}, g cdot mathbf{y})$, where $g$ is drawn from a finite group acting on the data. Our method jointly learns a latent space and a group representation in which group actions correspond to invertible linear maps -- without relying on group-specific inductive biases. We validate our approach on the infinite dSprites dataset with structured transformations defined by the finite group $G:= (R_m imes mathbb{Z}_n imes mathbb{Z}_n)$, combining discrete rotations and periodic translations. The resulting embeddings exhibit high-fidelity equivariance, with group operations faithfully reproduced in latent space. On synthetic data, we further validate the approach on the non-abelian orthogonal group $O(n)$ and the general linear group $GL(n)$. We also provide a theoretical proof for identifiability. While broad evaluation across diverse group types on real-world data remains future work, our results constitute the first successful demonstration of general-purpose encoder-only equivariant learning from group action observations alone, including non-trivial non-abelian groups and a product group motivated by modeling affine equivariances in computer vision.