SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the gradient mismatch and information loss in binary neural networks caused by non-differentiable binarization operations. The authors propose a learnable gradient compensation framework that decouples forward and backward gradient flows through a Dual-Path Gradient Compensator (DPGC) and dynamically balances branch contributions via an Adaptive Gradient Scaler (AGS). By integrating auxiliary backpropagation with output decomposition, the method enables more accurate gradient estimation. Notably, this is the first approach to incorporate a theoretically grounded, learnable gradient adaptation mechanism into binary network training. Extensive experiments demonstrate consistent and significant improvements over state-of-the-art methods across diverse tasks, including image classification, object detection, and language understanding.

📝 Abstract

The training of Binary Neural Networks (BNNs) is fundamentally based on gradient approximation for non-differentiable binarization operations (e.g., sign function). However, prevailing methods including the Straight-Through Estimator (STE) and its improved variants, rely on hand-crafted designs that suffer from gradient mismatch problem and information loss induced by fixed-range gradient clipping. To address this, we propose SURrogate GradiEnt Adaptation (SURGE), a novel learnable gradient compensation framework with theoretical grounding. SURGE mitigates gradient mismatch through auxiliary backpropagation. Specifically, we design a Dual-Path Gradient Compensator (DPGC) that constructs a parallel full-precision auxiliary branch for each binarized layer, decoupling gradient flow via output decomposition during backpropagation. DPGC enables bias-reduced gradient estimation by leveraging the full-precision branch to estimate components beyond STE's first-order approximation. To further enhance training stability, we introduce an Adaptive Gradient Scaler (AGS) based on an optimal scale factor to dynamically balance inter-branch gradient contributions via norm-based scaling. Experiments on image classification, object detection, and language understanding tasks demonstrate that SURGE performs best over state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Binary Neural Networks

gradient mismatch

Straight-Through Estimator

gradient approximation

information loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Binary Neural Networks

Surrogate Gradient

Gradient Mismatch