🤖 AI Summary
This work addresses the limited robustness of reinforcement learning (RL) policies in directional controller synthesis, which often stems from anisotropic generalization—where policies are effective only in localized regions of the parameter space. To overcome this, the authors propose a Soft Mixture-of-Experts (Soft MoE) framework that treats anisotropy as complementary expertise among multiple RL agents. A prior-confidence gating mechanism dynamically blends these expert policies, enabling collaborative exploration and enhanced coverage of the parameter space. This study presents the first application of Soft MoE to directional controller synthesis, significantly improving generalization and success rates on large-scale, previously unseen instances. Evaluated on an air traffic control benchmark, the approach substantially expands the solvable region of the parameter space and demonstrates markedly superior robustness compared to single-expert strategies.
📝 Abstract
On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single expert.