🤖 AI Summary
This work addresses the underexplored role of optimizers in training equivariant and geometric neural networks, which, despite their ability to encode geometric symmetries, often underperform unconstrained models due to optimization challenges. For the first time, we systematically compare the Muon and Adam optimizers across a range of equivariant architectures, employing Hessian curvature estimation, loss landscape visualization, and spectral analysis of weights—including stable and effective ranks—to reveal how optimizer choice profoundly shapes training dynamics and representational properties. On ModelNet40 point cloud classification and molecular tasks, Muon consistently outperforms Adam across all architectures, yielding models with smoother loss landscapes, higher curvature, and higher-rank weight matrices and intermediate features, thereby highlighting the critical interplay between optimizer design and geometric inductive biases.
📝 Abstract
Equivariant neural networks encode geometric symmetries by construction, yet they are often difficult to optimize and can underperform less constrained architectures. A growing body of work addresses this through architectural modifications such as constraint relaxation or approximate equivariance, while the role of the optimizer remains comparatively underexplored. We study this direction by comparing Muon and Adam across several equivariant and geometric architectures under pointcloud and molecular learning settings. On ModelNet40, where the comparison is clearest, Muon consistently improves over Adam across all architectures considered. We then analyze the trained ModelNet40 checkpoints through Hessian estimates, loss surface visualizations, and spectral properties of learned weights and intermediate representations. The checkpoints reached by Muon have larger Hessian curvature summaries but more regular loss surfaces, and their learned weights and representations have higher stable and effective ranks. These observations suggest that the interaction between optimizer design and geometric inductive bias deserves further attention from the community.