🤖 AI Summary
This work introduces the first text-guided controllable time-series generation task under cross-domain and instance-level constraints. Methodologically, we propose an LLM-driven multi-agent collaborative data synthesis framework and a hybrid generative architecture—BRIDGE—that integrates semantic prototypes, text-time alignment modeling, diffusion-based generation, and prototype-aware embedding. Our key contribution lies in synergistically combining large language models’ semantic understanding with multi-agent optimization to enable fine-grained domain knowledge injection and dynamic pattern control. Evaluated on 12 cross-domain datasets, BRIDGE achieves state-of-the-art fidelity on 11. Compared to text-agnostic baselines, it significantly improves controllability, reducing MSE by 12.52% and MAE by 6.34%. These results empirically validate that textual guidance enhances both generation fidelity and controllability.
📝 Abstract
Time-series Generation (TSG) is a prominent research area with broad applications in simulations, data augmentation, and counterfactual analysis. While existing methods have shown promise in unconditional single-domain TSG, real-world applications demand for cross-domain approaches capable of controlled generation tailored to domain-specific constraints and instance-level requirements. In this paper, we argue that text can provide semantic insights, domain information and instance-specific temporal patterns, to guide and improve TSG. We introduce ``Text-Controlled TSG'', a task focused on generating realistic time series by incorporating textual descriptions. To address data scarcity in this setting, we propose a novel LLM-based Multi-Agent framework that synthesizes diverse, realistic text-to-TS datasets. Furthermore, we introduce BRIDGE, a hybrid text-controlled TSG framework that integrates semantic prototypes with text description for supporting domain-level guidance. This approach achieves state-of-the-art generation fidelity on 11 of 12 datasets, and improves controllability by 12.52% on MSE and 6.34% MAE compared to no text input generation, highlighting its potential for generating tailored time-series data.