🤖 AI Summary
Existing time-series foundation models (TSFMs) struggle to capture multi-scale temporal dependencies in zero-shot cross-dataset transfer, particularly under pattern heterogeneity and sampling-rate discrepancies between source and target domains. To address this, we propose Hierarchical Interleaved Block Attention (HIBA), a novel architecture that jointly employs intra-block local sparse attention and inter-block dynamic global modeling to simultaneously capture fine-grained local dynamics and coarse-grained long-range dependencies. Leveraging HIBA, we develop the scalable Xihe model family, spanning 9.5M to 1.5B parameters. Evaluated on the GIFT-Eval benchmark, Xihe achieves substantial gains: Xihe-tiny (9.5M) outperforms most mainstream models, while Xihe-max (1.5B) establishes a new state-of-the-art for zero-shot transfer—significantly surpassing prior best methods.
📝 Abstract
The rapid advancement of time series foundation models (TSFMs) has been propelled by migrating architectures from language models. While existing TSFMs demonstrate impressive performance, their direct adoption of cross-domain architectures constrains effective capture of multiscale temporal dependencies inherent to time series data. This limitation becomes particularly pronounced during zero-shot transfer across datasets with divergent underlying patterns and sampling strategies. To address these challenges, we propose Hierarchical Interleaved Block Attention (HIBA) which employs hierarchical inter- and intra-block sparse attention to effectively capture multi-scale dependencies. Intra-block attention facilitates local information exchange, and inter-block attention operates across blocks to capture global temporal pattern interaction and dynamic evolution. Leveraging the HIBA architecture, we introduce Xihe, a scalable TSFM family spanning from an ultra-efficient 9.5M parameter configuration to high-capacity 1.5B variant. Evaluated on the comprehensive GIFT-Eval benchmark, our most compact Xihe-tiny model (9.5M) surpasses the majority of contemporary TSFMs, demonstrating remarkable parameter efficiency. More impressively, Xihe-max (1.5B) establishes new state-of-the-art zero-shot performance, surpassing previous best results by a substantial margin. This consistent performance excellence across the entire parameter spectrum provides compelling evidence for the exceptional generalization capabilities and architectural superiority of HIBA.