🤖 AI Summary
This study addresses the fundamental challenge in multiscale dynamical systems modeling: *how to automatically identify a small set of critical slow variables to construct interpretable and predictive low-dimensional models*. To this end, we propose a data-driven dimensionality reduction framework grounded in the information bottleneck principle. Methodologically, we establish, for the first time, an analytical connection between slow variables and eigenfunctions of the Koopman (or transfer) operator; introduce an optimal truncation criterion based on information compression rate; and integrate variational inference with autoencoding neural networks to build an interpretable deep learning architecture capable of discovering emergent order parameters. Applied to satellite atmospheric flow videos, our method successfully extracts dominant slow variables; applied to experimental videos of cyanobacterial microcolonies, it uncovers a novel synchronization order parameter. The framework thus achieves a unified balance between model interpretability and predictive accuracy.
📝 Abstract
Model reduction is the construction of simple yet predictive descriptions of the dynamics of many-body systems in terms of a few relevant variables. A prerequisite to model reduction is the identification of these relevant variables, a task for which no general method exists. Here, we develop a systematic approach based on the information bottleneck to identify the relevant variables, defined as those most predictive of the future. We elucidate analytically the relation between these relevant variables and the eigenfunctions of the transfer operator describing the dynamics. Further, we show that in the limit of high compression, the relevant variables are directly determined by the slowest-decaying eigenfunctions. Our information-based approach indicates when to optimally stop increasing the complexity of the reduced model. Furthermore, it provides a firm foundation to construct interpretable deep learning tools that perform model reduction. We illustrate how these tools work in practice by considering uncurated videos of atmospheric flows from which our algorithms automatically extract the dominant slow collective variables, as well as experimental videos of cyanobacteria colonies in which we discover an emergent synchronization order parameter. Significance Statement The first step to understand natural phenomena is to intuit which variables best describe them. An ambitious goal of artificial intelligence is to automate this process. Here, we develop a framework to identify these relevant variables directly from complex datasets. Very much like MP3 compression is about retaining information that matters most to the human ear, our approach is about keeping information that matters most to predict the future. We formalize this insight mathematically and systematically answer the question of when to stop increasing the complexity of minimal models. We illustrate how interpretable deep learning tools built on these ideas reveal emergent collective variables in settings ranging from satellite recordings of atmospheric fluid flows to experimental videos of cyanobacteria colonies.