🤖 AI Summary
While empirical studies show that implicit models—using small architectures with increased inference-time iterations—can match the performance of large explicit networks, no theoretical explanation exists for how their expressive power scales with computational budget during inference.
Method: This work provides the first nonparametric analysis establishing that implicit models, via parameter-sharing fixed-point iterations, asymptotically approximate increasingly complex function classes, with expressivity monotonically increasing in the number of inference iterations. We integrate tools from implicit differentiation, fixed-point theory, and nonparametric statistics.
Results: We validate our theory across diverse tasks—including image reconstruction, scientific computing, and operations research—demonstrating simultaneous improvements in solution quality and effective model capacity. Our framework enables memory-constant training and dynamically controllable inference, offering both a rigorous theoretical foundation and a practical paradigm for efficient implicit modeling.
📝 Abstract
Implicit models, an emerging model class, compute outputs by iterating a single parameter block to a fixed point. This architecture realizes an infinite-depth, weight-tied network that trains with constant memory, significantly reducing memory needs for the same level of performance compared to explicit models. While it is empirically known that these compact models can often match or even exceed larger explicit networks by allocating more test-time compute, the underlying mechanism remains poorly understood.
We study this gap through a nonparametric analysis of expressive power. We provide a strict mathematical characterization, showing that a simple and regular implicit operator can, through iteration, progressively express more complex mappings. We prove that for a broad class of implicit models, this process lets the model's expressive power scale with test-time compute, ultimately matching a much richer function class. The theory is validated across three domains: image reconstruction, scientific computing, and operations research, demonstrating that as test-time iterations increase, the complexity of the learned mapping rises, while the solution quality simultaneously improves and stabilizes.