🤖 AI Summary
Training large neural networks under fixed computational budgets imposes a rigid trade-off between performance and efficiency, hindering deployment in resource-constrained or dynamically varying environments. This paper introduces Nested Subspace Networks (NSN), the first architecture to enforce a nested subspace property in linear layer parameterization—ensuring that low-rank inference paths are mathematically embedded within higher-rank models. NSN integrates an uncertainty-aware joint optimization objective, a dynamic rank selection mechanism, and hierarchical weighted loss to enable smooth, single-model adaptation across a continuous spectrum of computational budgets. The method is compatible with lightweight adaptation of pretrained large language models (LLMs). Evaluated on LLMs, NSN achieves a 50% reduction in inference FLOPs with only a 5% accuracy drop, yielding a controllable and continuous computation–accuracy Pareto frontier.
📝 Abstract
Large neural networks are typically trained for a fixed computational budget, creating a rigid trade-off between performance and efficiency that is ill-suited for deployment in resource-constrained or dynamic environments. Existing approaches to this problem present a difficult choice: training a discrete collection of specialist models is computationally prohibitive, while dynamic methods like slimmable networks often lack the flexibility to be applied to large, pre-trained foundation models. In this work, we propose Nested Subspace Networks (NSNs), a novel architectural paradigm that enables a single model to be dynamically and granularly adjusted across a continuous spectrum of compute budgets at inference time. The core of our approach is to re-parameterize linear layers to satisfy a nested subspace property, such that the function computed at a given rank is a strict subspace of the function at any higher rank. We show that this entire hierarchy of models can be optimized jointly via an uncertainty-aware objective that learns to balance the contributions of different ranks based on their intrinsic difficulty. We demonstrate empirically that NSNs can be surgically applied to pre-trained LLMs and unlock a smooth and predictable compute-performance frontier. For example, a single NSN-adapted model can achieve a 50% reduction in inference FLOPs with only a 5 percentage point loss in accuracy. Our findings establish NSNs as a powerful framework for creating the next generation of adaptive foundation models.