Robust Exploration in Directed Controller Synthesis via Reinforcement Learning with Soft Mixture-of-Experts

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited robustness of reinforcement learning (RL) policies in directional controller synthesis, which often stems from anisotropic generalization—where policies are effective only in localized regions of the parameter space. To overcome this, the authors propose a Soft Mixture-of-Experts (Soft MoE) framework that treats anisotropy as complementary expertise among multiple RL agents. A prior-confidence gating mechanism dynamically blends these expert policies, enabling collaborative exploration and enhanced coverage of the parameter space. This study presents the first application of Soft MoE to directional controller synthesis, significantly improving generalization and success rates on large-scale, previously unseen instances. Evaluated on an air traffic control benchmark, the approach substantially expands the solvable region of the parameter space and demonstrates markedly superior robustness compared to single-expert strategies.

Technology Category

Application Category

📝 Abstract
On-the-fly Directed Controller Synthesis (OTF-DCS) mitigates state-space explosion by incrementally exploring the system and relies critically on an exploration policy to guide search efficiently. Recent reinforcement learning (RL) approaches learn such policies and achieve promising zero-shot generalization from small training instances to larger unseen ones. However, a fundamental limitation is anisotropic generalization, where an RL policy exhibits strong performance only in a specific region of the domain-parameter space while remaining fragile elsewhere due to training stochasticity and trajectory-dependent bias. To address this, we propose a Soft Mixture-of-Experts framework that combines multiple RL experts via a prior-confidence gating mechanism and treats these anisotropic behaviors as complementary specializations. The evaluation on the Air Traffic benchmark shows that Soft-MoE substantially expands the solvable parameter space and improves robustness compared to any single expert.
Problem

Research questions and friction points this paper is trying to address.

anisotropic generalization
reinforcement learning
controller synthesis
robust exploration
state-space explosion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Soft Mixture-of-Experts
Reinforcement Learning
Directed Controller Synthesis
Anisotropic Generalization
Robust Exploration
🔎 Similar Papers
No similar papers found.
T
Toshihide Ubukata
Waseda University, Tokyo, 169-8050, Japan.
Z
Zhiyao Wang
Osaka University, Japan.
E
Enhong Mu
Southwest University, China.
Jialong Li
Jialong Li
Waseda University
self-adaptive systemsrequirement engineeringhuman-in-the-loop
Kenji Tei
Kenji Tei
Institute of Science Tokyo
software architecturerequirement engineeringself-adaptive systemsformal verification