Byte Pair Encoding for Efficient Time Series Forecasting

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing time-series tokenization methods rely on fixed-length segmentation, leading to redundant tokens for simple patterns (e.g., prolonged constant segments), high computational overhead, and poor semantic adaptability. To address this, we propose the first motif-centric adaptive tokenization framework for time series: (1) it constructs an interpretable, discrete vocabulary grounded in high-frequency local motifs; (2) it introduces a time-series-specific subword tokenization paradigm inspired by Byte-Pair Encoding (BPE); and (3) it incorporates a gradient-free conditional decoding post-optimization strategy. Evaluated on mainstream time-series foundation models, our method achieves an average 36% improvement in forecasting accuracy and a 19.9× inference speedup; conditional decoding further reduces mean squared error (MSE) by 44%. The framework significantly enhances model generalizability, interpretability, and computational efficiency.

Technology Category

Application Category

📝 Abstract

Existing time series tokenization methods predominantly encode a constant number of samples into individual tokens. This inflexible approach can generate excessive tokens for even simple patterns like extended constant values, resulting in substantial computational overhead. Inspired by the success of byte pair encoding, we propose the first pattern-centric tokenization scheme for time series analysis. Based on a discrete vocabulary of frequent motifs, our method merges samples with underlying patterns into tokens, compressing time series adaptively. Exploiting our finite set of motifs and the continuous properties of time series, we further introduce conditional decoding as a lightweight yet powerful post-hoc optimization method, which requires no gradient computation and adds no computational overhead. On recent time series foundation models, our motif-based tokenization improves forecasting performance by 36% and boosts efficiency by 1990% on average. Conditional decoding further reduces MSE by up to 44%. In an extensive analysis, we demonstrate the adaptiveness of our tokenization to diverse temporal patterns, its generalization to unseen data, and its meaningful token representations capturing distinct time series properties, including statistical moments and trends.

Problem

Research questions and friction points this paper is trying to address.

Inefficient tokenization in time series forecasting

Excessive computational overhead from inflexible encoding

Lack of adaptive compression for temporal patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pattern-centric tokenization for time series

Conditional decoding without gradient computation

Adaptive compression using frequent motifs

🔎 Similar Papers

Unlocking the Power of LSTM for Long Term Time Series Forecasting