🤖 AI Summary
To address the challenge of optimal control for nonlinear stochastic systems, this paper proposes the Spectral Dynamics Embedding Control (SDEC) algorithm. SDEC is the first method to deeply integrate finite-dimensional spectral dynamical embedding—grounded in Koopman operator theory—with stochastic optimal control and policy gradient methods, enabling linear representation of the state-value function and effective policy optimization. Its key contributions are: (i) rigorous quantification of both truncation error from finite-dimensional spectral approximation and statistical error from finite-sample estimation; and (ii) incorporation of nonlinear dynamics priors and spectral embedding structure to ensure policy convergence and provable error bounds. Evaluated on the cart-pole swing-up task, SDEC significantly outperforms both Koopman-based linearization and iterative Linear-Quadratic Regulator (iLQR), achieving superior performance while providing quantifiable error bounds and theoretically guaranteed convergence in policy evaluation and optimization.
📝 Abstract
Optimal control is notoriously difficult for stochastic nonlinear systems. [1] introduced Spectral Dynamics Embedding for developing reinforcement learning methods for controlling an unknown system. It uses an infinite-dimensional feature to linearly represent the state-value function and exploits finite-dimensional truncation approximation for practical implementation. However, the finite-dimensional approximation properties in control have not been investigated even when the model is known. In this paper, we provide a tractable stochastic nonlinear control algorithm that exploits the nonlinear dynamics upon the finite-dimensional feature approximation, Spectral Dynamics Embedding Control (SDEC), with an in-depth theoretical analysis to characterize the approximation error induced by the finite-dimension truncation and statistical error induced by finite-sample approximation in both policy evaluation and policy optimization. We also empirically test the algorithm and compare the performance with Koopman-based methods and iLQR methods on the pendulum swingup problem.