🤖 AI Summary
Financial time-series modeling is hindered by the scarcity, low quality, and limited diversity of real-world data, degrading downstream trading and investment model performance. To address this, we propose Fiaigen—a novel, efficient, and high-fidelity financial time-series generation framework tailored to domain-specific characteristics. Fiaigen innovatively integrates manifold alignment, adversarial training, and a lightweight neural architecture to enhance distributional overlap between synthetic and real data within a low-dimensional embedding space, jointly optimizing generation efficiency, distributional fidelity, and downstream task utility. Extensive experiments demonstrate that Fiaigen achieves state-of-the-art distributional similarity (e.g., in Wasserstein distance and MMD), enables predictive and trading models trained on its synthetic data to approach the performance of those trained on real data, and generates high-quality sequences in seconds per sample. The method thus delivers superior fidelity, strong generalization across assets and regimes, and practical scalability for real-world deployment.
📝 Abstract
Data is vital in enabling machine learning models to advance research and practical applications in finance, where accurate and robust models are essential for investment and trading decision-making. However, real-world data is limited despite its quantity, quality, and variety. The data shortage of various financial assets directly hinders the performance of machine learning models designed to trade and invest in these assets. Generative methods can mitigate this shortage. In this paper, we introduce a set of novel techniques for time series data generation (we name them Fiaingen) and assess their performance across three criteria: (a) overlap of real-world and synthetic data on a reduced dimensionality space, (b) performance on downstream machine learning tasks, and (c) runtime performance. Our experiments demonstrate that the methods achieve state-of-the-art performance across the three criteria listed above. Synthetic data generated with Fiaingen methods more closely mirrors the original time series data while keeping data generation time close to seconds - ensuring the scalability of the proposed approach. Furthermore, models trained on it achieve performance close to those trained with real-world data.