Budget-aware Auto Optimizer Configurator

πŸ“… 2026-05-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

231K/year
πŸ“ Abstract
Optimizer states occupy massive GPU memory in large-scale model training. However, gradients in different network blocks exhibit distinct behaviors, such as varying directional stability and scale anisotropy, implying that expensive optimizer states are not universally necessary and using a global optimizer is often memory-inefficient. We propose the Budget-Aware Optimizer Configurator (BAOC) to reduce memory cost by assigning suitable optimizer configurations to individual blocks under given budgets. Specifically, BAOC samples gradient streams to derive statistical metrics that quantify the potential performance risk of applying cheaper configurations (e.g., low precision or removing momentum). It then solves a constrained allocation problem to minimize total risk under memory and time budgets, selecting a budget-feasible configuration for each block. Experiments across vision, language, and diffusion workloads demonstrate that BAOC maintains training quality while significantly reducing the memory usage of optimizer states. The code is available at https://anonymous.4open.science/r/BAOC-45C6.
Problem

Research questions and friction points this paper is trying to address.

optimizer states
memory efficiency
large-scale model training
budget-aware configuration
gradient behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

optimizer state compression
memory-efficient training
adaptive optimizer configuration
budget-aware optimization
gradient statistics
πŸ”Ž Similar Papers