🤖 AI Summary
This study addresses the challenges of data scarcity, strong market dependency, and complex multivariate cross-channel correlations that hinder effective modeling in government bond futures synthesis using existing diffusion models. To overcome these limitations, we propose TF-CoDiT, the first language-controllable generative framework for government bond futures time series. TF-CoDiT innovatively integrates a Diffusion Transformer (DiT), Discrete Wavelet Transform (DWT), and a U-shaped Variational Autoencoder (VAE) to model cross-channel dependencies in the latent space. Additionally, we introduce FinMAP, a financial attribute protocol that generates structured conditional prompts. Evaluated on four categories of government bond futures data from 2015 to 2025, our method achieves an MSE of 0.433 and an MAE of 0.453, demonstrating high fidelity and robustness across multiple contracts and time periods.
📝 Abstract
Diffusion Transformers (DiT) have achieved milestones in synthesizing financial time-series data, such as stock prices and order flows. However, their performance in synthesizing treasury futures data is still underexplored. This work emphasizes the characteristics of treasury futures data, including its low volume, market dependencies, and the grouped correlations among multivariables. To overcome these challenges, we propose TF-CoDiT, the first DiT framework for language-controlled treasury futures synthesis. To facilitate low-data learning, TF-CoDiT adapts the standard DiT by transforming multi-channel 1-D time series into Discrete Wavelet Transform (DWT) coefficient matrices. A U-shape VAE is proposed to encode cross-channel dependencies hierarchically into a latent variable and bridge the latent and DWT spaces through decoding, thereby enabling latent diffusion generation. To derive prompts that cover essential conditions, we introduce the Financial Market Attribute Protocol (FinMAP) - a multi-level description system that standardizes daily$/$periodical market dynamics by recognizing 17$/$23 economic indicators from 7/8 perspectives. In our experiments, we gather four types of treasury futures data covering the period from 2015 to 2025, and define data synthesis tasks with durations ranging from one week to four months. Extensive evaluations demonstrate that TF-CoDiT can produce highly authentic data with errors at most 0.433 (MSE) and 0.453 (MAE) to the ground-truth. Further studies evidence the robustness of TF-CoDiT across contracts and temporal horizons.