Does equivariance matter at scale?

📅 2024-10-30

🏛️ arXiv.org

📈 Citations: 16

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work investigates the necessity of equivariant architectural design in large-scale neural network training. We systematically compare equivariant and non-equivariant models on rigid-body interaction benchmarks, varying compute and data scale. Our methodology employs Transformer-based architectures, explicit equivariant modeling, rigorously controlled data augmentation, and power-law scaling analysis. Key contributions include: (i) the first empirical demonstration that equivariant models consistently outperform their non-equivariant counterparts across the full compute budget; (ii) discovery of fundamentally distinct optimal scaling strategies—specifically, differing trade-offs between model size and training steps—for equivariant versus non-equivariant models; and (iii) evidence that while equivariance substantially improves data efficiency, strong data augmentation can partially close the performance gap. These findings establish divergent optimal training configurations for the two model classes and provide empirical validation for the enduring value of structural priors—particularly equivariance—in large-model regimes.

Technology Category

Application Category

📝 Abstract

Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.

Problem

Research questions and friction points this paper is trying to address.

Does equivariance improve neural network performance at scale?

Can data augmentation replace equivariant design in large models?

How does compute budget allocation differ for equivariant vs non-equivariant models?

Innovation

Methods, ideas, or system contributions that make the work stand out.

Equivariance enhances data efficiency significantly

Power law scaling favors equivariant models compute-wise

Optimal compute allocation varies by model equivariance

🔎 Similar Papers

Improving Equivariant Model Training via Constraint Relaxation