SAC-MoE: Reinforcement Learning with Mixture-of-Experts for Control of Hybrid Dynamical Systems with Uncertainty

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Controlling hybrid dynamical systems—such as legged robots and autonomous vehicles—driven by latent-variable-induced mode switches remains challenging due to the tight coupling between continuous dynamics and unobservable discrete events; conventional model-based methods neglect uncertainty, while model-free reinforcement learning suffers from poor generalization across modes. To address this, we propose SAC-MoE: a Soft Actor-Critic architecture augmented with a Mixture-of-Experts (MoE) structure, where a learnable router dynamically selects specialized policy experts conditioned on inferred latent dynamic modes. We further introduce a challenge-oriented curriculum learning strategy to enhance cross-mode transferability. To our knowledge, SAC-MoE is the first framework to enable latent-aware adaptive policy routing within SAC. Empirical evaluation on hybrid autonomous driving and legged locomotion tasks demonstrates up to 6× improvement in zero-shot generalization performance over prior methods.

Technology Category

Application Category

📝 Abstract
Hybrid dynamical systems result from the interaction of continuous-variable dynamics with discrete events and encompass various systems such as legged robots, vehicles and aircrafts. Challenges arise when the system's modes are characterized by unobservable (latent) parameters and the events that cause system dynamics to switch between different modes are also unobservable. Model-based control approaches typically do not account for such uncertainty in the hybrid dynamics, while standard model-free RL methods fail to account for abrupt mode switches, leading to poor generalization. To overcome this, we propose SAC-MoE which models the actor of the Soft Actor-Critic (SAC) framework as a Mixture-of-Experts (MoE) with a learned router that adaptively selects among learned experts. To further improve robustness, we develop a curriculum-based training algorithm to prioritize data collection in challenging settings, allowing better generalization to unseen modes and switching locations. Simulation studies in hybrid autonomous racing and legged locomotion tasks show that SAC-MoE outperforms baselines (up to 6x) in zero-shot generalization to unseen environments. Our curriculum strategy consistently improves performance across all evaluated policies. Qualitative analysis shows that the interpretable MoE router activates different experts for distinct latent modes.
Problem

Research questions and friction points this paper is trying to address.

Control hybrid dynamical systems with unobservable latent parameters and mode switches
Address poor generalization of standard RL methods during abrupt mode transitions
Improve robustness to unseen environments and switching locations through curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts actor in Soft Actor-Critic framework
Learned router adaptively selects among experts
Curriculum training prioritizes challenging data collection
🔎 Similar Papers
No similar papers found.
L
Leroy D'Souza
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada
A
Akash Karthikeyan
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada
Yash Vardhan Pant
Yash Vardhan Pant
Assistant Professor, ECE, University of Waterloo
Control TheoryRoboticsMachine LearningFormal MethodsOptimization
S
Sebastian Fischmeister
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Canada