🤖 AI Summary
Existing lightweight spatiotemporal forecasting models struggle to simultaneously achieve high dynamic modeling accuracy and deep semantic–causal understanding. To address this, we propose Semantic–Spectral Knowledge Distillation (SS-KD): a framework employing a multimodal teacher model that jointly integrates text-based semantic priors—generated by large language models—and disentangled high-/low-frequency spectral features extracted via a spectral decoupling network. A unified semantic–spectral distillation loss guides a purely visual student model to implicitly learn causal semantics and multiscale time–frequency patterns. SS-KD is the first approach to synergistically incorporate both semantic priors and spectral representations into knowledge distillation, requiring no textual input or additional inference overhead. Evaluated on benchmarks including WeatherBench and TaxiBJ+, the student model achieves significant improvements over state-of-the-art methods in long-horizon forecasting and non-stationary scenarios, empirically validating the efficacy of semantic-guided spectral modeling.
📝 Abstract
Spatiotemporal forecasting often relies on computationally intensive models to capture complex dynamics. Knowledge distillation (KD) has emerged as a key technique for creating lightweight student models, with recent advances like frequency-aware KD successfully preserving spectral properties (i.e., high-frequency details and low-frequency trends). However, these methods are fundamentally constrained by operating on pixel-level signals, leaving them blind to the rich semantic and causal context behind the visual patterns. To overcome this limitation, we introduce S^2-KD, a novel framework that unifies Semantic priors with Spectral representations for distillation. Our approach begins by training a privileged, multimodal teacher model. This teacher leverages textual narratives from a Large Multimodal Model (LMM) to reason about the underlying causes of events, while its architecture simultaneously decouples spectral components in its latent space. The core of our framework is a new distillation objective that transfers this unified semantic-spectral knowledge into a lightweight, vision-only student. Consequently, the student learns to make predictions that are not only spectrally accurate but also semantically coherent, without requiring any textual input or architectural overhead at inference. Extensive experiments on benchmarks like WeatherBench and TaxiBJ+ show that S^2-KD significantly boosts the performance of simple student models, enabling them to outperform state-of-the-art methods, particularly in long-horizon and complex non-stationary scenarios.