🤖 AI Summary
This work investigates the fundamental expressive capacity and learning dynamics of bias-free ReLU networks. Method: Through rigorous theoretical analysis and symmetry-based modeling, we establish that two-layer bias-free ReLU networks can only represent linear odd functions; under symmetric data distributions, their gradient flow admits closed-form analytical solutions. We further construct an equivalence framework linking bias-free ReLU networks to linear networks across dynamic evolution, generalization behavior, and non-lazy training regimes. Contribution/Results: We prove that—even with increased depth or Leaky ReLU variants—such networks retain core properties of linear models. Our findings provide a novel theoretical lens for understanding latent linear mechanisms in “apparently nonlinear” neural networks, revealing how structural constraints (e.g., absence of bias) fundamentally shape both expressivity and optimization landscapes.
📝 Abstract
We investigate the implications of removing bias in ReLU networks regarding their expressivity and learning dynamics. We first show that two-layer bias-free ReLU networks have limited expressivity: the only odd function two-layer bias-free ReLU networks can express is a linear one. We then show that, under symmetry conditions on the data, these networks have the same learning dynamics as linear networks. This enables us to give analytical time-course solutions to certain two-layer bias-free (leaky) ReLU networks outside the lazy learning regime. While deep bias-free ReLU networks are more expressive than their two-layer counterparts, they still share a number of similarities with deep linear networks. These similarities enable us to leverage insights from linear networks to understand certain ReLU networks. Overall, our results show that some properties previously established for bias-free ReLU networks arise due to equivalence to linear networks.