🤖 AI Summary
Sampling-based controllers such as Model Predictive Path Integral (MPPI) control often suffer from high variance and low sample efficiency, limiting their applicability in settings where samples are costly or scarce. This work proposes a hybrid variance-reduced MPPI framework that decomposes the objective function into a quadratic approximation and a residual term, enabling the construction of a closed-form model-guided prior that steers sampling toward informative regions. The approach is agnostic to the specific form of the underlying model and seamlessly integrates diverse sources of geometric information—including exact gradients, quasi-Newton approximations, or gradient-free smoothing schemes. Evaluated on standard optimization benchmarks, a nonlinear underactuated inverted pendulum, and nonsmooth contact-rich manipulation tasks, the method demonstrates substantially improved sample efficiency, achieving faster convergence and superior performance under low-sample regimes.
📝 Abstract
Sampling-based controllers, such as Model Predictive Path Integral (MPPI) methods, offer substantial flexibility but often suffer from high variance and low sample efficiency. To address these challenges, we introduce a hybrid variance-reduced MPPI framework that integrates a prior model into the sampling process. Our key insight is to decompose the objective function into a known approximate model and a residual term. Since the residual captures only the discrepancy between the model and the objective, it typically exhibits a smaller magnitude and lower variance than the original objective. Although this principle applies to general modeling choices, we demonstrate that adopting a quadratic approximation enables the derivation of a closed-form, model-guided prior that effectively concentrates samples in informative regions. Crucially, the framework is agnostic to the source of geometric information, allowing the quadratic model to be constructed from exact derivatives, structural approximations (e.g., Gauss- or Quasi-Newton), or gradient-free randomized smoothing. We validate the approach on standard optimization benchmarks, a nonlinear, underactuated cart-pole control task, and a contact-rich manipulation problem with non-smooth dynamics. Across these domains, we achieve faster convergence and superior performance in low-sample regimes compared to standard MPPI. These results suggest that the method can make sample-based control strategies more practical in scenarios where obtaining samples is expensive or limited.