🤖 AI Summary
To address the high sampling cost, low interpretability, and black-box nature of surrogate environments in model-based reinforcement learning (MBRL), this paper introduces Sparse Identification of Nonlinear Dynamics (SINDy) for surrogate environment construction—its first application in MBRL. We propose a lightweight, physics-informed dynamical surrogate model. Using only minimal real-world interaction (75 steps on Mountain Car, 1,000 on Lunar Lander), it faithfully reproduces environment dynamics, achieving state correlation >0.997 and MSE on the order of 10⁻⁶. Training steps decrease by 20–35%, while policy performance matches that in the true environment. Our core contribution lies in overcoming the black-box limitation of neural-network-based surrogates: the proposed model achieves high fidelity, low data dependency, and explicit physical structure—establishing a new paradigm for interpretable and efficient model-based RL.
📝 Abstract
This paper introduces an approach for developing surrogate environments in reinforcement learning (RL) using the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm. We demonstrate the effectiveness of our approach through extensive experiments in OpenAI Gym environments, particularly Mountain Car and Lunar Lander. Our results show that SINDy-based surrogate models can accurately capture the underlying dynamics of these environments while reducing computational costs by 20-35%. With only 75 interactions for Mountain Car and 1000 for Lunar Lander, we achieve state-wise correlations exceeding 0.997, with mean squared errors as low as 3.11e-06 for Mountain Car velocity and 1.42e-06 for LunarLander position. RL agents trained in these surrogate environments require fewer total steps (65,075 vs. 100,000 for Mountain Car and 801,000 vs. 1,000,000 for Lunar Lander) while achieving comparable performance to those trained in the original environments, exhibiting similar convergence patterns and final performance metrics. This work contributes to the field of model-based RL by providing an efficient method for generating accurate, interpretable surrogate environments.