🤖 AI Summary
Mainstream time-series models rely on complex sequence mixers—such as self-attention or MLP-Mixers—yet their necessity remains未经 systematically validated.
Method: We propose MatrixMixer, a unifying framework that abstracts diverse sequence mixers as learnable mixing matrices, and for the first time systematically replaces them with fully connected layers (i.e., dense weight matrices), conducting large-scale empirical evaluation across 29 benchmark tasks.
Contribution/Results: Our experiments demonstrate that the simplified fully connected architecture matches or surpasses state-of-the-art methods across five canonical time-series tasks—including forecasting, classification, imputation, anomaly detection, and regression—thereby challenging the implicit assumption that deeper or more complex architectures inherently yield superior performance. This work establishes a new paradigm for efficient, interpretable time-series modeling, showing that sophisticated sequence-mixing mechanisms are not strictly necessary for strong empirical performance.
📝 Abstract
Sequence and channel mixers, the core mechanism in sequence models, have become the de facto standard in time series analysis (TSA). However, recent studies have questioned the necessity of complex sequence mixers, such as attention mechanisms, demonstrating that simpler architectures can achieve comparable or even superior performance. This suggests that the benefits attributed to complex sequencemixers might instead emerge from other architectural or optimization factors. Based on this observation, we pose a central question: Are common sequence mixers necessary for time-series analysis? Therefore, we propose JustDense, an empirical study that systematically replaces sequence mixers in various well-established TSA models with dense layers. Grounded in the MatrixMixer framework, JustDense treats any sequence mixer as a mixing matrix and replaces it with a dense layer. This substitution isolates the mixing operation, enabling a clear theoretical foundation for understanding its role. Therefore, we conducted extensive experiments on 29 benchmarks covering five representative TSA tasks using seven state-of-the-art TSA models to address our research question. The results show that replacing sequence mixers with dense layers yields comparable or even superior performance. In the cases where dedicated sequence mixers still offer benefits, JustDense challenges the assumption that "deeper and more complex architectures are inherently better" in TSA.