Xihe: Scalable Zero-Shot Time Series Learner Via Hierarchical Interleaved Block Attention

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing time-series foundation models (TSFMs) struggle to capture multi-scale temporal dependencies in zero-shot cross-dataset transfer, particularly under pattern heterogeneity and sampling-rate discrepancies between source and target domains. To address this, we propose Hierarchical Interleaved Block Attention (HIBA), a novel architecture that jointly employs intra-block local sparse attention and inter-block dynamic global modeling to simultaneously capture fine-grained local dynamics and coarse-grained long-range dependencies. Leveraging HIBA, we develop the scalable Xihe model family, spanning 9.5M to 1.5B parameters. Evaluated on the GIFT-Eval benchmark, Xihe achieves substantial gains: Xihe-tiny (9.5M) outperforms most mainstream models, while Xihe-max (1.5B) establishes a new state-of-the-art for zero-shot transfer—significantly surpassing prior best methods.

Technology Category

Application Category

📝 Abstract
The rapid advancement of time series foundation models (TSFMs) has been propelled by migrating architectures from language models. While existing TSFMs demonstrate impressive performance, their direct adoption of cross-domain architectures constrains effective capture of multiscale temporal dependencies inherent to time series data. This limitation becomes particularly pronounced during zero-shot transfer across datasets with divergent underlying patterns and sampling strategies. To address these challenges, we propose Hierarchical Interleaved Block Attention (HIBA) which employs hierarchical inter- and intra-block sparse attention to effectively capture multi-scale dependencies. Intra-block attention facilitates local information exchange, and inter-block attention operates across blocks to capture global temporal pattern interaction and dynamic evolution. Leveraging the HIBA architecture, we introduce Xihe, a scalable TSFM family spanning from an ultra-efficient 9.5M parameter configuration to high-capacity 1.5B variant. Evaluated on the comprehensive GIFT-Eval benchmark, our most compact Xihe-tiny model (9.5M) surpasses the majority of contemporary TSFMs, demonstrating remarkable parameter efficiency. More impressively, Xihe-max (1.5B) establishes new state-of-the-art zero-shot performance, surpassing previous best results by a substantial margin. This consistent performance excellence across the entire parameter spectrum provides compelling evidence for the exceptional generalization capabilities and architectural superiority of HIBA.
Problem

Research questions and friction points this paper is trying to address.

Capturing multiscale temporal dependencies in time series data
Enhancing zero-shot transfer across datasets with divergent patterns
Overcoming limitations of cross-domain architectures in time series models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Interleaved Block Attention for multi-scale dependencies
Intra-block attention enables local information exchange
Inter-block attention captures global temporal pattern interaction
🔎 Similar Papers
No similar papers found.
Y
Yinbo Sun
Ant Group, Hangzhou, China
Y
Yuchen Fang
Ant Group, Hangzhou, China
Z
Zhibo Zhu
Ant Group, Hangzhou, China
J
Jia Li
Ant Group, Hangzhou, China
Y
Yu Liu
Ant Group, Hangzhou, China
Q
Qiwen Deng
Ant Group, Hangzhou, China
J
Jun Zhou
Ant Group, Hangzhou, China
H
Hang Yu
Ant Group, Hangzhou, China
X
Xingyu Lu
Ant Group, Hangzhou, China
Lintao Ma
Lintao Ma
Ant Group
bayesian learningtime series analysisgenerative models