The Few Govern the Many:Unveiling Few-Layer Dominance for Time Series Models

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper identifies the “scaling paradox” in time-series forecasting: increasing model capacity or dataset size can unexpectedly degrade performance. Through extensive empirical analysis, we discover, for the first time, a “few-layer dominance” phenomenon across mainstream time-series models—only a small subset of layers contributes critically to representation learning, while the majority are redundant or even detrimental to training. To address this, we propose a general, fully automated framework for layer importance estimation and pruning, which identifies and retains critical layers without fine-tuning. Evaluated on LLM4TS and TSFM architectures, our method achieves up to 12% accuracy gain and 2.7× inference speedup using only 21% of parameters. Moreover, on over 95% of benchmark tasks, models retaining ≤30% of layers match or exceed the performance of full models—demonstrating substantial efficiency–accuracy trade-off improvements.

Technology Category

Application Category

📝 Abstract

Large-scale models are at the forefront of time series (TS) forecasting, dominated by two paradigms: fine-tuning text-based Large Language Models (LLM4TS) and training Time Series Foundation Models (TSFMs) from scratch. Both approaches share a foundational assumption that scaling up model capacity and data volume leads to improved performance. However, we observe a extit{ extbf{scaling paradox}} in TS models, revealing a puzzling phenomenon that larger models do emph{NOT} achieve better performance. Through extensive experiments on two model families across four scales (100M to 1.7B parameters) and diverse data (up to 6B observations), we rigorously confirm that the scaling paradox is a pervasive issue. We then diagnose its root cause by analyzing internal representations, identifying a phenomenon we call extit{few-layer dominance}: only a small subset of layers are functionally important, while the majority are redundant, under-utilized, and can even distract training. Based on this discovery, we propose a practical method to automatically identify and retain only these dominant layers. In our models, retaining only 21% of the parameters achieves up to a 12% accuracy improvement and a 2.7$ imes$ inference speedup. We validate the universality of our method on 8 prominent SOTA models (LLM4TS and TSFMs, 90M to 6B), showing that retaining less than 30% of layers achieves comparable or superior accuracy in over 95% of tasks.

Problem

Research questions and friction points this paper is trying to address.

Time series models exhibit scaling paradox with larger models not performing better

Few-layer dominance causes most layers to be redundant and under-utilized

Proposing method to automatically identify and retain only dominant layers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies few-layer dominance in time series models

Automatically retains only functionally important layers

Achieves better accuracy with fewer parameters and faster inference

🔎 Similar Papers

No similar papers found.

Authors to Follow