🤖 AI Summary
This work addresses the adverse effects of weight quantization in low-precision deployment, which can compromise the existence and uniqueness of solutions in monotone operator equilibrium networks and undermine solver convergence. For the first time, the authors model quantization effects through the lens of spectral perturbation theory, establishing a precise relationship between quantization-induced perturbations and the margin of monotonicity. Building on this insight, they derive a monotonicity-margin-based criterion for convergence and provide associated error bounds. Furthermore, they introduce a condition number to characterize the interplay between quantization precision and forward error. Combining quantization-aware training with fixed-point analysis, they empirically validate their theoretical phase-transition threshold on MNIST: post-training quantization converges at 5 bits or higher but diverges below 4 bits, whereas quantization-aware training restores provable convergence even at 4 bits.
📝 Abstract
Monotone operator equilibrium networks are implicit-layer models whose output is the unique equilibrium of a monotone operator, guaranteeing existence, uniqueness, and convergence. When deployed on low-precision hardware, weights are quantized, potentially destroying these guarantees. We analyze weight quantization as a spectral perturbation of the underlying monotone inclusion. Convergence of the quantized solver is guaranteed whenever the spectral-norm weight perturbation is smaller than the monotonicity margin; the displacement between quantized and full-precision equilibria is bounded in terms of the perturbation size and margin; and a condition number characterizing the ratio of the operator norm to the margin links quantization precision to forward error. MNIST experiments confirm a phase transition at the predicted threshold: three- and four-bit post-training quantization diverge, while five-bit and above converge. The backward-pass guarantee enables quantization-aware training, which recovers provable convergence at four bits.