TimeRecipe: A Time-Series Forecasting Recipe via Benchmarking Module Level Effectiveness

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In time-series forecasting, the effectiveness of key design components—such as decomposition and normalization—varies significantly across data characteristics and task settings; however, existing benchmarks lack fine-grained attribution analysis to quantify individual component contributions. Method: We introduce the first module-level interpretable benchmarking framework, systematically evaluating over 10,000 architectural configurations across multiple datasets, forecast horizons, and task types. Leveraging modular experimental orchestration and rigorous controlled-variable analysis, we isolate the impact of each component under diverse scenarios. Contribution/Results: Our analysis uncovers strong, previously undocumented correlations between component performance and forecasting scenarios, enabling empirically grounded architecture recommendations. Experiments yield a lightweight state-of-the-art ensemble model, produce a generalizable component-effectiveness atlas, and release a fully open-sourced toolchain—including benchmark infrastructure, evaluation scripts, and standardized configuration templates—to support reproducible, component-aware forecasting research.

Technology Category

Application Category

📝 Abstract
Time-series forecasting is an essential task with wide real-world applications across domains. While recent advances in deep learning have enabled time-series forecasting models with accurate predictions, there remains considerable debate over which architectures and design components, such as series decomposition or normalization, are most effective under varying conditions. Existing benchmarks primarily evaluate models at a high level, offering limited insight into why certain designs work better. To mitigate this gap, we propose TimeRecipe, a unified benchmarking framework that systematically evaluates time-series forecasting methods at the module level. TimeRecipe conducts over 10,000 experiments to assess the effectiveness of individual components across a diverse range of datasets, forecasting horizons, and task settings. Our results reveal that exhaustive exploration of the design space can yield models that outperform existing state-of-the-art methods and uncover meaningful intuitions linking specific design choices to forecasting scenarios. Furthermore, we release a practical toolkit within TimeRecipe that recommends suitable model architectures based on these empirical insights. The benchmark is available at: https://github.com/AdityaLab/TimeRecipe.
Problem

Research questions and friction points this paper is trying to address.

Evaluating module-level effectiveness in time-series forecasting models
Identifying optimal architectures and design components for varying conditions
Providing empirical insights and toolkit for model architecture recommendations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically evaluates module-level forecasting components
Conducts 10,000+ experiments across diverse datasets
Recommends model architectures via empirical insights
🔎 Similar Papers
No similar papers found.