🤖 AI Summary
This work addresses the performance degradation of Transformers in web-scale time series forecasting, which stems from error accumulation, sensitivity to out-of-distribution data, and susceptibility to saddle points in high-dimensional optimization landscapes. To mitigate these issues, the authors propose a lightweight Transformer architecture co-optimized with a novel Escape-Explore Optimizer (EEO). EEO enhances exploration capabilities in high-dimensional parameter spaces, enabling the model to effectively escape saddle points and sharp minima, thereby improving cross-task generalization. Integrated with multivariate long-sequence modeling and multimodal feature fusion, the proposed method achieves state-of-the-art results across 11 time series benchmarks and the Synapse medical image segmentation task, demonstrating exceptional generalization and stability.
📝 Abstract
Transformer-based foundation models have achieved remarkable progress in tasks such as time-series forecasting and image segmentation. However, they frequently suffer from error accumulation in multivariate long-sequence prediction and exhibit vulnerability to out-of-distribution samples in image-related tasks. Furthermore, these challenges become particularly pronounced in large-scale Web data analysis tasks, which typically involve complex temporal patterns and multimodal features. This complexity substantially increases optimization difficulty, rendering models prone to stagnation at saddle points within high-dimensional parameter spaces. To address these issues, we propose a lightweight Transformer architecture in conjunction with a novel Escape-Explore Optimizer (EEO). The optimizer enhances both exploration and generalization while effectively avoiding sharp minima and saddle-point traps. Experimental results show that, in representative Web data scenarios, our method achieves performance on par with state-of-the-art models across 11 time-series benchmark datasets and the Synapse medical image segmentation task. Moreover, it demonstrates superior generalization and stability, thereby validating its potential as a versatile cross-task foundation model for Web-scale data mining and analysis.