🤖 AI Summary
Autonomous vessels struggle to simultaneously satisfy maritime traffic regulations and maintain robust navigation policies in complex, dynamic sea environments. Method: This paper proposes a counterexample-driven reinforcement learning framework that formalizes maritime rules using Signal Temporal Logic (STL), automatically synthesizes high-risk violation scenarios as adversarial training samples, and enables closed-loop policy optimization and compliance verification under formal constraints. Contribution/Results: By directly embedding STL specifications into the training process, the framework enhances agent understanding and generalization to dynamic navigational constraints via counterexample-guided learning. In dual-vessel open-sea navigation experiments, the method achieves a 23.6% improvement in rule compliance rate, generates more challenging and contextually relevant training scenarios, and yields policies with superior safety and robustness compared to baseline approaches.
📝 Abstract
Compliance with maritime traffic rules is essential for the safe operation of autonomous vessels, yet training reinforcement learning (RL) agents to adhere to them is challenging. The behavior of RL agents is shaped by the training scenarios they encounter, but creating scenarios that capture the complexity of maritime navigation is non-trivial, and real-world data alone is insufficient. To address this, we propose a falsification-driven RL approach that generates adversarial training scenarios in which the vessel under test violates maritime traffic rules, which are expressed as signal temporal logic specifications. Our experiments on open-sea navigation with two vessels demonstrate that the proposed approach provides more relevant training scenarios and achieves more consistent rule compliance.