Bregman Centroid Guided Cross-Entropy Method

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-entropy methods (CEM) in model-based reinforcement learning (MBRL) often suffer from premature convergence in multimodal optimization landscapes due to unimodal sampling. This paper proposes Bregman Ensemble CEM (BE-CEM), the first CEM variant incorporating Bregman centroids into the framework. BE-CEM enables directed updates of ensemble workers via performance-weighted aggregation and diversity-aware regularization. Leveraging the duality between Bregman divergences and exponential-family distributions, it ensures theoretical consistency and plug-and-play compatibility with existing MBRL pipelines. Furthermore, updates are constrained within a trust region to enhance stability. Evaluated on synthetic benchmarks, cluttered-environment navigation, and end-to-end MBRL tasks, BE-CEM achieves significantly faster convergence and higher-quality solutions compared to standard CEM, with negligible computational overhead.

Technology Category

Application Category

📝 Abstract
The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $ extit{Bregman centroids}$ for principled information aggregation and diversity control. $ extbf{$mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $ extbf{$mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $ extbf{$mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.
Problem

Research questions and friction points this paper is trying to address.

Addresses premature convergence in CEM due to unimodal sampling
Enhances CEM with Bregman centroids for diversity control
Improves convergence and solution quality in MBRL tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bregman centroids for diversity control
Trust region sampling for worker updates
Seamless integration with standard CEM
🔎 Similar Papers
No similar papers found.
Y
Yuliang Gu
Department of Mechanical Science and Engineering, UIUC, United States
Hongpeng Cao
Hongpeng Cao
Ph.D. Student, Technical University of Munich
roboticsdeep reinforcement learningcontrolcomputer vision
Marco Caccamo
Marco Caccamo
Professor, Department of Mechanical Engineering, Technical University of Munich (TUM)
Real-Time and Cyber-Physical Systems
N
N. Hovakimyan
Department of Mechanical Science and Engineering, UIUC, United States