🤖 AI Summary
In time-series forecasting, the effectiveness of key design components—such as decomposition and normalization—varies significantly across data characteristics and task settings; however, existing benchmarks lack fine-grained attribution analysis to quantify individual component contributions.
Method: We introduce the first module-level interpretable benchmarking framework, systematically evaluating over 10,000 architectural configurations across multiple datasets, forecast horizons, and task types. Leveraging modular experimental orchestration and rigorous controlled-variable analysis, we isolate the impact of each component under diverse scenarios.
Contribution/Results: Our analysis uncovers strong, previously undocumented correlations between component performance and forecasting scenarios, enabling empirically grounded architecture recommendations. Experiments yield a lightweight state-of-the-art ensemble model, produce a generalizable component-effectiveness atlas, and release a fully open-sourced toolchain—including benchmark infrastructure, evaluation scripts, and standardized configuration templates—to support reproducible, component-aware forecasting research.
📝 Abstract
Time-series forecasting is an essential task with wide real-world applications across domains. While recent advances in deep learning have enabled time-series forecasting models with accurate predictions, there remains considerable debate over which architectures and design components, such as series decomposition or normalization, are most effective under varying conditions. Existing benchmarks primarily evaluate models at a high level, offering limited insight into why certain designs work better. To mitigate this gap, we propose TimeRecipe, a unified benchmarking framework that systematically evaluates time-series forecasting methods at the module level. TimeRecipe conducts over 10,000 experiments to assess the effectiveness of individual components across a diverse range of datasets, forecasting horizons, and task settings. Our results reveal that exhaustive exploration of the design space can yield models that outperform existing state-of-the-art methods and uncover meaningful intuitions linking specific design choices to forecasting scenarios. Furthermore, we release a practical toolkit within TimeRecipe that recommends suitable model architectures based on these empirical insights. The benchmark is available at: https://github.com/AdityaLab/TimeRecipe.