Deep Hierarchical Learning with Nested Subspace Networks

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Training large neural networks under fixed computational budgets imposes a rigid trade-off between performance and efficiency, hindering deployment in resource-constrained or dynamically varying environments. This paper introduces Nested Subspace Networks (NSN), the first architecture to enforce a nested subspace property in linear layer parameterization—ensuring that low-rank inference paths are mathematically embedded within higher-rank models. NSN integrates an uncertainty-aware joint optimization objective, a dynamic rank selection mechanism, and hierarchical weighted loss to enable smooth, single-model adaptation across a continuous spectrum of computational budgets. The method is compatible with lightweight adaptation of pretrained large language models (LLMs). Evaluated on LLMs, NSN achieves a 50% reduction in inference FLOPs with only a 5% accuracy drop, yielding a controllable and continuous computation–accuracy Pareto frontier.

Technology Category

Application Category

📝 Abstract

Large neural networks are typically trained for a fixed computational budget, creating a rigid trade-off between performance and efficiency that is ill-suited for deployment in resource-constrained or dynamic environments. Existing approaches to this problem present a difficult choice: training a discrete collection of specialist models is computationally prohibitive, while dynamic methods like slimmable networks often lack the flexibility to be applied to large, pre-trained foundation models. In this work, we propose Nested Subspace Networks (NSNs), a novel architectural paradigm that enables a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets at inference time. The core of our approach is to re-parameterize linear layers to satisfy a nested subspace property, such that the function computed at a given rank is a strict subspace of the function at any higher rank. We show that this entire hierarchy of models can be optimized jointly via an uncertainty-aware objective that learns to balance the contributions of different ranks based on their intrinsic difficulty. We demonstrate empirically that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier. For example, a single NSN-adapted model can achieve a 50% reduction in inference FLOPs with only a 5 percentage point loss in accuracy. Our findings establish NSNs as a powerful framework for creating the next generation of adaptive foundation models.

Problem

Research questions and friction points this paper is trying to address.

Enabling dynamic model adjustment for varying computational budgets

Overcoming rigid trade-offs between performance and efficiency

Applying surgical adaptation to pre-trained foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reparameterizes layers for nested subspace property

Uses uncertainty-aware objective for joint optimization

Enables surgical adaptation to pre-trained LLMs

🔎 Similar Papers

No similar papers found.