ZORMS-LfD: Learning from Demonstrations with Zeroth-Order Random Matrix Search

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the joint learning of the cost function, constraint set, and system dynamics from expert demonstrations for constrained optimal control problems, applicable to both continuous- and discrete-time systems, without requiring smoothness or gradient information of the loss function. To this end, we propose a zero-order stochastic matrix search (ZOSMS) method—the first unified framework capable of efficiently solving nonsmooth, gradient-free constrained learning problems across both time domains. ZOSMS bypasses gradient-based optimization entirely, performing model-free optimization directly in the control parameter space. Experimental results demonstrate that ZOSMS achieves state-of-the-art (SOTA) or superior performance on multiple benchmark tasks; reduces computational time by over 80% on continuous-time unconstrained tasks; and significantly outperforms classical zero-order methods—including Nelder-Mead—in constrained scenarios.

Technology Category

Application Category

📝 Abstract
We propose Zeroth-Order Random Matrix Search for Learning from Demonstrations (ZORMS-LfD). ZORMS-LfD enables the costs, constraints, and dynamics of constrained optimal control problems, in both continuous and discrete time, to be learned from expert demonstrations without requiring smoothness of the learning-loss landscape. In contrast, existing state-of-the-art first-order methods require the existence and computation of gradients of the costs, constraints, dynamics, and learning loss with respect to states, controls and/or parameters. Most existing methods are also tailored to discrete time, with constrained problems in continuous time receiving only cursory attention. We demonstrate that ZORMS-LfD matches or surpasses the performance of state-of-the-art methods in terms of both learning loss and compute time across a variety of benchmark problems. On unconstrained continuous-time benchmark problems, ZORMS-LfD achieves similar loss performance to state-of-the-art first-order methods with an over $80$% reduction in compute time. On constrained continuous-time benchmark problems where there is no specialized state-of-the-art method, ZORMS-LfD is shown to outperform the commonly used gradient-free Nelder-Mead optimization method.
Problem

Research questions and friction points this paper is trying to address.

Learn costs, constraints, dynamics from expert demonstrations
Avoid gradient computation in optimal control problems
Handle continuous and discrete time constrained problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zeroth-Order Random Matrix Search for learning
No need for smooth loss landscape gradients
Outperforms gradient-free methods in constrained problems
🔎 Similar Papers