🤖 AI Summary
This paper addresses the prevalent overreliance on model architecture optimization—while neglecting data quality—in time series forecasting. To bridge this gap, we propose DCATS, the first metadata-driven, LLM-agent-based framework for automated time series data cleaning. Centered on data quality, DCATS leverages both structured and unstructured metadata accompanying time series to guide an LLM agent in performing core cleaning tasks—including anomaly detection, missing value imputation, and sampling alignment—and jointly optimizes with diverse forecasting models. Unlike conventional AutoML approaches, DCATS is the first to systematically integrate LLM agents into the time series data quality enhancement pipeline, advancing AutoML toward a “data-first” paradigm. Evaluated on a large-scale traffic flow forecasting benchmark, DCATS achieves an average 6% reduction in prediction error and demonstrates consistent performance gains across multiple forecasting models and prediction horizons.
📝 Abstract
Large Language Model (LLM) powered agents have emerged as effective planners for Automated Machine Learning (AutoML) systems. While most existing AutoML approaches focus on automating feature engineering and model architecture search, recent studies in time series forecasting suggest that lightweight models can often achieve state-of-the-art performance. This observation led us to explore improving data quality, rather than model architecture, as a potentially fruitful direction for AutoML on time series data. We propose DCATS, a Data-Centric Agent for Time Series. DCATS leverages metadata accompanying time series to clean data while optimizing forecasting performance. We evaluated DCATS using four time series forecasting models on a large-scale traffic volume forecasting dataset. Results demonstrate that DCATS achieves an average 6% error reduction across all tested models and time horizons, highlighting the potential of data-centric approaches in AutoML for time series forecasting.