🤖 AI Summary
This paper addresses the joint learning of the cost function, constraint set, and system dynamics from expert demonstrations for constrained optimal control problems, applicable to both continuous- and discrete-time systems, without requiring smoothness or gradient information of the loss function. To this end, we propose a zero-order stochastic matrix search (ZOSMS) method—the first unified framework capable of efficiently solving nonsmooth, gradient-free constrained learning problems across both time domains. ZOSMS bypasses gradient-based optimization entirely, performing model-free optimization directly in the control parameter space. Experimental results demonstrate that ZOSMS achieves state-of-the-art (SOTA) or superior performance on multiple benchmark tasks; reduces computational time by over 80% on continuous-time unconstrained tasks; and significantly outperforms classical zero-order methods—including Nelder-Mead—in constrained scenarios.
📝 Abstract
We propose Zeroth-Order Random Matrix Search for Learning from Demonstrations (ZORMS-LfD). ZORMS-LfD enables the costs, constraints, and dynamics of constrained optimal control problems, in both continuous and discrete time, to be learned from expert demonstrations without requiring smoothness of the learning-loss landscape. In contrast, existing state-of-the-art first-order methods require the existence and computation of gradients of the costs, constraints, dynamics, and learning loss with respect to states, controls and/or parameters. Most existing methods are also tailored to discrete time, with constrained problems in continuous time receiving only cursory attention. We demonstrate that ZORMS-LfD matches or surpasses the performance of state-of-the-art methods in terms of both learning loss and compute time across a variety of benchmark problems. On unconstrained continuous-time benchmark problems, ZORMS-LfD achieves similar loss performance to state-of-the-art first-order methods with an over $80$% reduction in compute time. On constrained continuous-time benchmark problems where there is no specialized state-of-the-art method, ZORMS-LfD is shown to outperform the commonly used gradient-free Nelder-Mead optimization method.