Policy Gradient for LQR with Domain Randomization

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Policy gradient (PG) methods for linear quadratic regulator (LQR) control under domain randomization (DR) lack theoretical guarantees, particularly regarding global convergence and finite-sample complexity. Method: We establish the first global convergence proof for PG in the DR-LQR setting and rigorously quantify its finite-sample complexity. To circumvent the conventional requirement of an initial jointly stabilizing controller, we propose a discount annealing algorithm that enables reliable optimization without any prior stable policy. Our approach integrates policy gradient optimization, robust control theory for linear systems, and finite-sample analysis. Results: Theoretical analysis reveals how system heterogeneity—induced by domain randomization—affects convergence behavior. Empirical evaluation demonstrates that our method significantly improves robustness and training stability in sim-to-real transfer, validating both theoretical insights and practical efficacy.

Technology Category

Application Category

📝 Abstract

Domain randomization (DR) enables sim-to-real transfer by training controllers on a distribution of simulated environments, with the goal of achieving robust performance in the real world. Although DR is widely used in practice and is often solved using simple policy gradient (PG) methods, understanding of its theoretical guarantees remains limited. Toward addressing this gap, we provide the first convergence analysis of PG methods for domain-randomized linear quadratic regulation (LQR). We show that PG converges globally to the minimizer of a finite-sample approximation of the DR objective under suitable bounds on the heterogeneity of the sampled systems. We also quantify the sample-complexity associated with achieving a small performance gap between the sample-average and population-level objectives. Additionally, we propose and analyze a discount-factor annealing algorithm that obviates the need for an initial jointly stabilizing controller, which may be challenging to find. Empirical results support our theoretical findings and highlight promising directions for future work, including risk-sensitive DR formulations and stochastic PG algorithms.

Problem

Research questions and friction points this paper is trying to address.

Analyzing convergence of Policy Gradient for domain-randomized LQR

Quantifying sample-complexity for performance gap in DR objectives

Proposing discount-factor annealing to avoid stabilizing controller requirement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Convergence analysis of Policy Gradient for LQR

Discount-factor annealing for stabilizing controllers

Sample-complexity quantification for performance gap

🔎 Similar Papers

No similar papers found.