🤖 AI Summary
Although ensemble ENSO phase prediction models exhibit strong performance, their lack of interpretability hinders understanding of the underlying mechanisms governing long-term predictability. To address this limitation, this work proposes a novel distillation framework that selects structurally consistent and accurately predictive members from an ensemble of entropy-optimized sparse probabilistic approximation (eSPA) models to construct a compact, diagnosable single model capable of forecasting ENSO phases up to 24 months in advance. By uniquely integrating model distillation with interpretability analysis, the approach not only maintains state-of-the-art predictive skill but also successfully identifies established physical precursors, characterizes the spatiotemporal evolution of ENSO, and enables retrospective diagnosis of key events. Further analysis reveals that ENSO prediction complexity peaks during the spring predictability barrier period.
📝 Abstract
This paper introduces a distillation framework for an ensemble of entropy-optimal Sparse Probabilistic Approximation (eSPA) models, trained exclusively on satellite-era observational and reanalysis data to predict ENSO phase up to 24 months in advance. While eSPA ensembles yield state-of-the-art forecast skill, they are harder to interpret than individual eSPA models. We show how to compress the ensemble into a compact set of "distilled" models by aggregating the structure of only those ensemble members that make correct predictions. This process yields a single, diagnostically tractable model for each forecast lead time that preserves forecast performance while also enabling diagnostics that are impractical to implement on the full ensemble.
An analysis of the regime persistence of the distilled model "superclusters", as well as cross-lead clustering consistency, shows that the discretised system accurately captures the spatiotemporal dynamics of ENSO. By considering the effective dimension of the feature importance vectors, the complexity of the input space required for correct ENSO phase prediction is shown to peak when forecasts must cross the boreal spring predictability barrier. Spatial importance maps derived from the feature importance vectors are introduced to identify where predictive information resides in each field and are shown to include known physical precursors at certain lead times. Case studies of key events are also presented, showing how fields reconstructed from distilled model centroids trace the evolution from extratropical and inter-basin precursors to the mature ENSO state. Overall, the distillation framework enables a rigorous investigation of long-range ENSO predictability that complements real-time data-driven operational forecasts.