Xihe: Scalable Zero-Shot Time Series Learner Via Hierarchical Interleaved Block Attention

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

Existing time-series foundation models (TSFMs) struggle to capture multi-scale temporal dependencies in zero-shot cross-dataset transfer, particularly under pattern heterogeneity and sampling-rate discrepancies between source and target domains. To address this, we propose Hierarchical Interleaved Block Attention (HIBA), a novel architecture that jointly employs intra-block local sparse attention and inter-block dynamic global modeling to simultaneously capture fine-grained local dynamics and coarse-grained long-range dependencies. Leveraging HIBA, we develop the scalable Xihe model family, spanning 9.5M to 1.5B parameters. Evaluated on the GIFT-Eval benchmark, Xihe achieves substantial gains: Xihe-tiny (9.5M) outperforms most mainstream models, while Xihe-max (1.5B) establishes a new state-of-the-art for zero-shot transfer—significantly surpassing prior best methods.

Technology Category

Application Category

📝 Abstract

The rapid advancement of time series foundation models (TSFMs) has been propelled by migrating architectures from language models. While existing TSFMs demonstrate impressive performance, their direct adoption of cross-domain architectures constrains effective capture of multiscale temporal dependencies inherent to time series data. This limitation becomes particularly pronounced during zero-shot transfer across datasets with divergent underlying patterns and sampling strategies. To address these challenges, we propose Hierarchical Interleaved Block Attention (HIBA) which employs hierarchical inter- and intra-block sparse attention to effectively capture multi-scale dependencies. Intra-block attention facilitates local information exchange, and inter-block attention operates across blocks to capture global temporal pattern interaction and dynamic evolution. Leveraging the HIBA architecture, we introduce Xihe, a scalable TSFM family spanning from an ultra-efficient 9.5M parameter configuration to high-capacity 1.5B variant. Evaluated on the comprehensive GIFT-Eval benchmark, our most compact Xihe-tiny model (9.5M) surpasses the majority of contemporary TSFMs, demonstrating remarkable parameter efficiency. More impressively, Xihe-max (1.5B) establishes new state-of-the-art zero-shot performance, surpassing previous best results by a substantial margin. This consistent performance excellence across the entire parameter spectrum provides compelling evidence for the exceptional generalization capabilities and architectural superiority of HIBA.

Problem

Research questions and friction points this paper is trying to address.

Capturing multiscale temporal dependencies in time series data

Enhancing zero-shot transfer across datasets with divergent patterns

Overcoming limitations of cross-domain architectures in time series models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Interleaved Block Attention for multi-scale dependencies

Intra-block attention enables local information exchange

Inter-block attention captures global temporal pattern interaction

🔎 Similar Papers

A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model