🤖 AI Summary
To address intellectual property theft and deepfake time-series data proliferation in large language model–based time-series forecasting (LLMTS), this paper proposes Waltz—the first post-hoc watermarking framework tailored for LLMTS. Waltz innovatively introduces watermarking into LLMTS by embedding imperceptible, detectable implicit tokens via statistical properties of infrequent tokens and block embeddings. It further designs a z-score–based similarity detection mechanism and a gradient-constrained noise control strategy to jointly ensure robustness and minimal prediction perturbation. Extensive experiments across two state-of-the-art LLMTS models and seven benchmark time-series datasets demonstrate that Waltz achieves high watermark detection accuracy (>98%), while preserving original forecasting performance with negligible degradation (MAE change <0.3%). This work establishes a principled trade-off between security and practicality in generative time-series modeling.
📝 Abstract
Large Language Model-based Time Series Forecasting (LLMTS) has shown remarkable promise in handling complex and diverse temporal data, representing a significant step toward foundation models for time series analysis. However, this emerging paradigm introduces two critical challenges. First, the substantial commercial potential and resource-intensive development raise urgent concerns about intellectual property (IP) protection. Second, their powerful time series forecasting capabilities may be misused to produce misleading or fabricated deepfake time series data. To address these concerns, we explore watermarking the outputs of LLMTS models, that is, embedding imperceptible signals into the generated time series data that remain detectable by specialized algorithms. We propose a novel post-hoc watermarking framework, Waltz, which is broadly compatible with existing LLMTS models. Waltz is inspired by the empirical observation that time series patch embeddings are rarely aligned with a specific set of LLM tokens, which we term ``cold tokens''. Leveraging this insight, Waltz embeds watermarks by rewiring the similarity statistics between patch embeddings and cold token embeddings, and detects watermarks using similarity z-scores. To minimize potential side effects, we introduce a similarity-based embedding position identification strategy and employ projected gradient descent to constrain the watermark noise within a defined boundary. Extensive experiments using two popular LLMTS models across seven benchmark datasets demonstrate that Waltz achieves high watermark detection accuracy with minimal impact on the quality of the generated time series.