🤖 AI Summary
Cross-entropy methods (CEM) in model-based reinforcement learning (MBRL) often suffer from premature convergence in multimodal optimization landscapes due to unimodal sampling. This paper proposes Bregman Ensemble CEM (BE-CEM), the first CEM variant incorporating Bregman centroids into the framework. BE-CEM enables directed updates of ensemble workers via performance-weighted aggregation and diversity-aware regularization. Leveraging the duality between Bregman divergences and exponential-family distributions, it ensures theoretical consistency and plug-and-play compatibility with existing MBRL pipelines. Furthermore, updates are constrained within a trust region to enhance stability. Evaluated on synthetic benchmarks, cluttered-environment navigation, and end-to-end MBRL tasks, BE-CEM achieves significantly faster convergence and higher-quality solutions compared to standard CEM, with negligible computational overhead.
📝 Abstract
The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose Bregman Centroid Guided CEM ($mathcal{BC}$-EvoCEM), a lightweight enhancement to ensemble CEM that leverages $ extit{Bregman centroids}$ for principled information aggregation and diversity control. $ extbf{$mathcal{BC}$-EvoCEM}$ computes a performance-weighted Bregman centroid across CEM workers and updates the least contributing ones by sampling within a trust region around the centroid. Leveraging the duality between Bregman divergences and exponential family distributions, we show that $ extbf{$mathcal{BC}$-EvoCEM}$ integrates seamlessly into standard CEM pipelines with negligible overhead. Empirical results on synthetic benchmarks, a cluttered navigation task, and full MBRL pipelines demonstrate that $ extbf{$mathcal{BC}$-EvoCEM}$ enhances both convergence and solution quality, providing a simple yet effective upgrade for CEM.