Backslash: Rate Constrained Optimized Training of Large Language Models

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This paper addresses the high parameter redundancy and substantial memory overhead in large language model (LLM) training. It introduces rate-distortion optimization (RDO) into the training process for the first time, proposing an end-to-end dynamic compression framework. Methodologically, it integrates Lagrangian multiplier-based rate constraint control, gradient reweighting, and structure-aware sparsification to enable controllable trade-offs between accuracy and parameter complexity. Key contributions include: (1) suppressing redundancy from the training outset, enabling robust pruning at high ratios; (2) achieving zero accuracy loss under 80% parameter pruning, with 60–90% memory reduction; and (3) significantly outperforming post-training compression methods in accuracy, generalization, robustness, and edge deployment efficiency.

Technology Category

Application Category

📝 Abstract

The rapid advancement of large-language models (LLMs) has driven extensive research into parameter compression after training has been completed, yet compression during the training phase remains largely unexplored. In this work, we introduce Rate-Constrained Training (Backslash), a novel training-time compression approach based on rate-distortion optimization (RDO). Backslash enables a flexible trade-off between model accuracy and complexity, significantly reducing parameter redundancy while preserving performance. Experiments in various architectures and tasks demonstrate that Backslash can reduce memory usage by 60% - 90% without accuracy loss and provides significant compression gain compared to compression after training. Moreover, Backslash proves to be highly versatile: it enhances generalization with small Lagrange multipliers, improves model robustness to pruning (maintaining accuracy even at 80% pruning rates), and enables network simplification for accelerated inference on edge devices.

Problem

Research questions and friction points this paper is trying to address.

Compression during LLM training remains unexplored

Balance model accuracy and complexity effectively

Reduce memory usage without losing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rate-Constrained Training optimizes model accuracy and complexity

Reduces memory usage by 60-90% without accuracy loss

Enhances robustness to pruning and edge device inference

🔎 Similar Papers

Spike No More: Stabilizing the Pre-training of Large Language Models