Convergence for Discrete Parameter Updates

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address efficiency and accuracy degradation in low-precision training caused by post-hoc quantization of continuous parameter updates, this paper proposes a novel **discrete update paradigm**: directly designing update rules operating exclusively in the discrete parameter space, thereby eliminating real-valued computation and storage. We establish, for the first time, a convergence theory for discrete update schemes and construct concrete polynomial-based mechanisms—e.g., multinomial updates. Grounded in discrete dynamical systems modeling, our approach fully removes dependence on continuous variables and naturally aligns with intrinsically discrete model architectures. Experiments demonstrate that, while preserving model accuracy, the method significantly improves training efficiency and hardware compatibility. It provides a theoretically rigorous yet implementationally simple framework for ultra-low-bit (e.g., sub-2-bit) neural network training.

Technology Category

Application Category

📝 Abstract

Modern deep learning models require immense computational resources, motivating research into low-precision training. Quantised training addresses this by representing training components in low-bit integers, but typically relies on discretising real-valued updates. We introduce an alternative approach where the update rule itself is discrete, avoiding the quantisation of continuous updates by design. We establish convergence guarantees for a general class of such discrete schemes, and present a multinomial update rule as a concrete example, supported by empirical evaluation. This perspective opens new avenues for efficient training, particularly for models with inherently discrete structure.

Problem

Research questions and friction points this paper is trying to address.

Develops discrete update rules for low-precision training

Establishes convergence guarantees for discrete parameter update schemes

Enables efficient training for models with discrete structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete update rule avoids continuous quantization

Convergence guarantees for general discrete schemes

Multinomial update rule with empirical evaluation

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning