Convergence for Discrete Parameter Updates

πŸ“… 2025-12-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address efficiency and accuracy degradation in low-precision training caused by post-hoc quantization of continuous parameter updates, this paper proposes a novel **discrete update paradigm**: directly designing update rules operating exclusively in the discrete parameter space, thereby eliminating real-valued computation and storage. We establish, for the first time, a convergence theory for discrete update schemes and construct concrete polynomial-based mechanismsβ€”e.g., multinomial updates. Grounded in discrete dynamical systems modeling, our approach fully removes dependence on continuous variables and naturally aligns with intrinsically discrete model architectures. Experiments demonstrate that, while preserving model accuracy, the method significantly improves training efficiency and hardware compatibility. It provides a theoretically rigorous yet implementationally simple framework for ultra-low-bit (e.g., sub-2-bit) neural network training.

Technology Category

Application Category

πŸ“ Abstract
Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a multinomial update rule as a concrete example, supported by empirical evaluation. This perspective opens new avenues for efficient training, particularly for models with inherently discrete structure.
Problem

Research questions and friction points this paper is trying to address.

Develops discrete update rules for low-precision training
Establishes convergence guarantees for discrete parameter update schemes
Enables efficient training for models with discrete structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete update rule avoids continuous quantization
Convergence guarantees for general discrete schemes
Multinomial update rule with empirical evaluation