🤖 AI Summary
Financial time series data often suffer from concept drift and non-stationary distributions, which degrade model generalization. To address this, this work proposes a drift-aware adaptive data stream system that, for the first time, unifies data augmentation, curriculum learning, and data workflow management within a differentiable framework, enabling gradient-based bilevel optimization and traceable replay. The system incorporates parameterized data operation modules—including single-asset transformations, multi-asset mixing, and data filtering—together with an adaptive planner that dynamically adjusts augmentation strategies to align with evolving market conditions. Evaluated on both predictive modeling and reinforcement learning–based trading tasks, the approach significantly enhances model robustness and risk-adjusted returns.
📝 Abstract
In quantitative finance, the gap between training and real-world performance-driven by concept drift and distributional non-stationarity-remains a critical obstacle for building reliable data-driven systems. Models trained on static historical data often overfit, resulting in poor generalization in dynamic markets. The mantra"History Is Not Enough"underscores the need for adaptive data generation that learns to evolve with the market rather than relying solely on past observations. We present a drift-aware dataflow system that integrates machine learning-based adaptive control into the data curation process. The system couples a parameterized data manipulation module comprising single-stock transformations, multi-stock mix-ups, and curation operations, with an adaptive planner-scheduler that employs gradient-based bi-level optimization to control the system. This design unifies data augmentation, curriculum learning, and data workflow management under a single differentiable framework, enabling provenance-aware replay and continuous data quality monitoring. Extensive experiments on forecasting and reinforcement learning trading tasks demonstrate that our framework enhances model robustness and improves risk-adjusted returns. The system provides a generalizable approach to adaptive data management and learning-guided workflow automation for financial data.