Byte Pair Encoding for Efficient Time Series Forecasting

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing time-series tokenization methods rely on fixed-length segmentation, leading to redundant tokens for simple patterns (e.g., prolonged constant segments), high computational overhead, and poor semantic adaptability. To address this, we propose the first motif-centric adaptive tokenization framework for time series: (1) it constructs an interpretable, discrete vocabulary grounded in high-frequency local motifs; (2) it introduces a time-series-specific subword tokenization paradigm inspired by Byte-Pair Encoding (BPE); and (3) it incorporates a gradient-free conditional decoding post-optimization strategy. Evaluated on mainstream time-series foundation models, our method achieves an average 36% improvement in forecasting accuracy and a 19.9× inference speedup; conditional decoding further reduces mean squared error (MSE) by 44%. The framework significantly enhances model generalizability, interpretability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Existing time series tokenization methods predominantly encode a constant number of samples into individual tokens. This inflexible approach can generate excessive tokens for even simple patterns like extended constant values, resulting in substantial computational overhead. Inspired by the success of byte pair encoding, we propose the first pattern-centric tokenization scheme for time series analysis. Based on a discrete vocabulary of frequent motifs, our method merges samples with underlying patterns into tokens, compressing time series adaptively. Exploiting our finite set of motifs and the continuous properties of time series, we further introduce conditional decoding as a lightweight yet powerful post-hoc optimization method, which requires no gradient computation and adds no computational overhead. On recent time series foundation models, our motif-based tokenization improves forecasting performance by 36% and boosts efficiency by 1990% on average. Conditional decoding further reduces MSE by up to 44%. In an extensive analysis, we demonstrate the adaptiveness of our tokenization to diverse temporal patterns, its generalization to unseen data, and its meaningful token representations capturing distinct time series properties, including statistical moments and trends.
Problem

Research questions and friction points this paper is trying to address.

Inefficient tokenization in time series forecasting
Excessive computational overhead from inflexible encoding
Lack of adaptive compression for temporal patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pattern-centric tokenization for time series
Conditional decoding without gradient computation
Adaptive compression using frequent motifs
🔎 Similar Papers
No similar papers found.
L
Leon Gotz
Volkswagen AG, Technical University of Munich, Munich Data Science Institute
Marcel Kollovieh
Marcel Kollovieh
Technical University of Munich
Machine LearningComputer VisionMedical Imaging
S
Stephan Gunnemann
Technical University of Munich, Munich Data Science Institute
Leo Schwinn
Leo Schwinn
Technical University of Munich
Machine LearningDeep LearningAdversarial Attacks