🤖 AI Summary
While equivariant neural networks excel on symmetric tasks, their training is often hampered by optimization difficulties—yet it remains unclear whether the root cause lies in the equivariance constraints themselves or inadequate hyperparameter tuning.
Method: We theoretically establish that intrinsic parameter symmetries in unconstrained models strictly impede convergence to the globally optimal solution within the equivariant subspace. To address this, we propose *dynamic group representation relaxation*: instead of enforcing fixed standard equivariant structures, we adaptively reselect group representations at hidden layers based on the optimization trajectory.
Contribution/Results: Leveraging group representation theory and loss landscape geometry, we provide the first rigorous proof that symmetry-induced degeneracies obstruct optimization. Empirical validation confirms that relaxed weights indeed correspond to distinct group representations. Our work establishes a verifiable geometric principle for training equivariant models, bridging theory and practice in equivariant deep learning.
📝 Abstract
Equivariant neural networks have proven to be effective for tasks with known underlying symmetries. However, optimizing equivariant networks can be tricky and best training practices are less established than for standard networks. In particular, recent works have found small training benefits from relaxing equivariance constraints. This raises the question: do equivariance constraints introduce fundamental obstacles to optimization? Or do they simply require different hyperparameter tuning? In this work, we investigate this question through a theoretical analysis of the loss landscape geometry. We focus on networks built using permutation representations, which we can view as a subset of unconstrained MLPs. Importantly, we show that the parameter symmetries of the unconstrained model has nontrivial effects on the loss landscape of the equivariant subspace and under certain conditions can provably prevent learning of the global minima. Further, we empirically demonstrate in such cases, relaxing to an unconstrained MLP can sometimes solve the issue. Interestingly, the weights eventually found via relaxation corresponds to a different choice of group representation in the hidden layer. From this, we draw 3 key takeaways. (1) Viewing any class of networks in the context of larger unconstrained function space can give important insights on loss landscape structure. (2) Within the unconstrained function space, equivariant networks form a complicated union of linear hyperplanes, each associated with a specific choice of internal group representation. (3) Effective relaxation of equivariance may require not only adding nonequivariant degrees of freedom, but also rethinking the fixed choice of group representations in hidden layers.