Bregman Centroid Guided Cross-Entropy Method

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

Cross-entropy methods (CEM) in model-based reinforcement learning (MBRL) often suffer from premature convergence in multimodal optimization landscapes due to unimodal sampling. This paper proposes Bregman Ensemble CEM (BE-CEM), the first CEM variant incorporating Bregman centroids into the framework. BE-CEM enables directed updates of ensemble workers via performance-weighted aggregation and diversity-aware regularization. Leveraging the duality between Bregman divergences and exponential-family distributions, it ensures theoretical consistency and plug-and-play compatibility with existing MBRL pipelines. Furthermore, updates are constrained within a trust region to enhance stability. Evaluated on synthetic benchmarks, cluttered-environment navigation, and end-to-end MBRL tasks, BE-CEM achieves significantly faster convergence and higher-quality solutions compared to standard CEM, with negligible computational overhead.

Technology Category

Application Category

📝 Abstract

The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $ extit{Bregman centroids}$ for principled information aggregation and diversity control. $ extbf{$mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $ extbf{$mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $ extbf{$mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.

Problem

Research questions and friction points this paper is trying to address.

Addresses premature convergence in CEM due to unimodal sampling

Enhances CEM with Bregman centroids for diversity control

Improves convergence and solution quality in MBRL tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bregman centroids for diversity control

Trust region sampling for worker updates

Seamless integration with standard CEM

🔎 Similar Papers

No similar papers found.