🤖 AI Summary
Deep reinforcement learning (DRL) suffers from low sample efficiency, opaque policy representations, and high deployment overhead—critical bottlenecks in data-scarce, compute-constrained, and interpretability-sensitive domains. Method: This paper introduces SINDy-RL, the first framework unifying sparse dynamical modeling, reward function learning, and control policy synthesis. It integrates Sparse Identification of Nonlinear Dynamics (SINDy), model-based RL, symbolic regression, and low-dimensional differentiable policy parameterization to jointly optimize for sample efficiency, physical interpretability, and deployability. Contributions/Results: On benchmark control and fluid simulation tasks, SINDy-RL matches state-of-the-art DRL performance while reducing environment interactions by 3–5× and compressing policy parameters by three orders of magnitude. Crucially, the learned policy is an explicit, analytically tractable symbolic function—overcoming the embedded-system deployment limitations inherent to black-box deep policies.
📝 Abstract
Deep reinforcement learning (DRL) has shown significant promise for uncovering sophisticated control policies that interact in environments with complicated dynamics, such as stabilizing the magnetohydrodynamics of a tokamak fusion reactor or minimizing the drag force exerted on an object in a fluid flow. However, these algorithms require an abundance of training examples and may become prohibitively expensive for many applications. In addition, the reliance on deep neural networks often results in an uninterpretable, black-box policy that may be too computationally expensive to use with certain embedded systems. Recent advances in sparse dictionary learning, such as the sparse identification of nonlinear dynamics (SINDy), have shown promise for creating efficient and interpretable data-driven models in the low-data regime. In this work we introduce SINDy-RL, a unifying framework for combining SINDy and DRL to create efficient, interpretable, and trustworthy representations of the dynamics model, reward function, and control policy. We demonstrate the effectiveness of our approaches on benchmark control environments and challenging fluids problems. SINDy-RL achieves comparable performance to state-of-the-art DRL algorithms using significantly fewer interactions in the environment and results in an interpretable control policy orders of magnitude smaller than a deep neural network policy.