Deep Hierarchical Learning with Nested Subspace Networks

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Training large neural networks under fixed computational budgets imposes a rigid trade-off between performance and efficiency, hindering deployment in resource-constrained or dynamically varying environments. This paper introduces Nested Subspace Networks (NSN), the first architecture to enforce a nested subspace property in linear layer parameterization—ensuring that low-rank inference paths are mathematically embedded within higher-rank models. NSN integrates an uncertainty-aware joint optimization objective, a dynamic rank selection mechanism, and hierarchical weighted loss to enable smooth, single-model adaptation across a continuous spectrum of computational budgets. The method is compatible with lightweight adaptation of pretrained large language models (LLMs). Evaluated on LLMs, NSN achieves a 50% reduction in inference FLOPs with only a 5% accuracy drop, yielding a controllable and continuous computation–accuracy Pareto frontier.

Technology Category

Application Category

📝 Abstract
Large neural networks are typically trained for a fixed computational budget, creating a rigid trade-off between performance and efficiency that is ill-suited for deployment in resource-constrained or dynamic environments. Existing approaches to this problem present a difficult choice: training a discrete collection of specialist models is computationally prohibitive, while dynamic methods like slimmable networks often lack the flexibility to be applied to large, pre-trained foundation models. In this work, we propose Nested Subspace Networks (NSNs), a novel architectural paradigm that enables a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets at inference time. The core of our approach is to re-parameterize linear layers to satisfy a nested subspace property, such that the function computed at a given rank is a strict subspace of the function at any higher rank. We show that this entire hierarchy of models can be optimized jointly via an uncertainty-aware objective that learns to balance the contributions of different ranks based on their intrinsic difficulty. We demonstrate empirically that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier. For example, a single NSN-adapted model can achieve a 50% reduction in inference FLOPs with only a 5 percentage point loss in accuracy. Our findings establish NSNs as a powerful framework for creating the next generation of adaptive foundation models.
Problem

Research questions and friction points this paper is trying to address.

Enabling dynamic model adjustment for varying computational budgets
Overcoming rigid trade-offs between performance and efficiency
Applying surgical adaptation to pre-trained foundation models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reparameterizes layers for nested subspace property
Uses uncertainty-aware objective for joint optimization
Enables surgical adaptation to pre-trained LLMs
🔎 Similar Papers
No similar papers found.