🤖 AI Summary
This study investigates the relationship between end-to-end equivariance and layer-wise equivariance in deep neural networks, offering a theoretical explanation for the empirically observed phenomenon wherein weights spontaneously develop equivariant structures during training. Under a parameter identifiability assumption, the work establishes—for the first time—a rigorous proof that if a network as a whole is equivariant under a group action, then there exists a parameter configuration under which each layer is equivariant in its latent space. Leveraging tools from group actions, parameter identifiability, and abstract algebra, the authors construct an architecture-agnostic theoretical framework applicable to a broad class of identifiable networks. This framework provides a solid mathematical foundation for the emergence of equivariant structures in practice and reveals the intrinsic mechanism by which equivariance naturally arises during training.
📝 Abstract
We investigate the relation between end-to-end equivariance and layerwise equivariance in deep neural networks. We prove the following: For a network whose end-to-end function is equivariant with respect to group actions on the input and output spaces, there is a parameter choice yielding the same end-to-end function such that its layers are equivariant with respect to some group actions on the latent spaces. Our result assumes that the parameters of the model are identifiable in an appropriate sense. This identifiability property has been established in the literature for a large class of networks, to which our results apply immediately, while it is conjectural for others. The theory we develop is grounded in an abstract formalism, and is therefore architecture-agnostic. Overall, our results provide a mathematical explanation for the emergence of equivariant structures in the weights of neural networks during training -- a phenomenon that is consistently observed in practice.