ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the limitations of default residual connection layouts in deep neural networks, which hinder effective utilization of model depth and lead to slow convergence. From an optimization perspective, the authors demonstrate that the arrangement of residual connections exerts an exponential influence on convergence rates. To overcome this, they propose ANCRe (Adaptive Neural Connection Re-allocation), a lightweight, learnable, data-driven framework that dynamically optimizes residual connectivity. Grounded in optimization theory, ANCRe employs a parameterized connection learning mechanism coupled with an adaptive re-allocation strategy, making it applicable to large language models, diffusion models, and deep ResNets. Extensive experiments show that ANCRe consistently accelerates convergence, improves final performance, and enhances depth efficiency across diverse architectures, with computational and memory overheads below 1%.

Technology Category

Application Category

📝 Abstract

Scaling network depth has been a central driver behind the success of modern foundation models, yet recent investigations suggest that deep layers are often underutilized. This paper revisits the default mechanism for deepening neural networks, namely residual connections, from an optimization perspective. Rigorous analysis proves that the layout of residual connections can fundamentally shape convergence behavior, and even induces an exponential gap in convergence rates. Prompted by this insight, we introduce adaptive neural connection reassignment (ANCRe), a principled and lightweight framework that parameterizes and learns residual connectivities from the data. ANCRe adaptively reassigns residual connections with negligible computational and memory overhead ($<1\%$), while enabling more effective utilization of network depth. Extensive numerical tests across pre-training of large language models, diffusion models, and deep ResNets demonstrate consistently accelerated convergence, boosted performance, and enhanced depth efficiency over conventional residual connections.

Problem

Research questions and friction points this paper is trying to address.

depth scaling

residual connections

convergence efficiency

underutilized deep layers

neural network optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Neural Connection Reassignment

Residual Connections

Depth Scaling