🤖 AI Summary
This study addresses the limited knowledge-driven, interactive environmental cognition of existing autonomous surface vessels (ASVs) in complex waterways, which hinders their ability to translate visual perception into high-level navigational decisions compliant with maritime rules. To bridge this gap, the authors introduce WaterVideoQA—the first large-scale video question-answering benchmark tailored for diverse aquatic scenarios—and propose NaviMind, a multi-agent neuro-symbolic system integrating adaptive semantic routing, context-aware hierarchical reasoning, and self-reflective validation mechanisms. The proposed approach substantially outperforms current baselines, enabling explainable, trustworthy, and regulation-compliant decision-making in dynamic maritime environments. This work advances ASVs beyond pattern recognition toward a new paradigm of rule-abiding cognitive reasoning.
📝 Abstract
While autonomous navigation has achieved remarkable success in passive perception (e.g., object detection and segmentation), it remains fundamentally constrained by a void in knowledge-driven, interactive environmental cognition. In the high-stakes domain of maritime navigation, the ability to bridge the gap between raw visual perception and complex cognitive reasoning is not merely an enhancement but a critical prerequisite for Autonomous Surface Vessels to execute safe and precise maneuvers. To this end, we present WaterVideoQA, the first large-scale, comprehensive Video Question Answering benchmark specifically engineered for all-waterway environments. This benchmark encompasses 3,029 video clips across six distinct waterway categories, integrating multifaceted variables such as volatile lighting and dynamic weather to rigorously stress-test ASV capabilities across a five-tier hierarchical cognitive framework. Furthermore, we introduce NaviMind, a pioneering multi-agent neuro-symbolic system designed for open-ended maritime reasoning. By synergizing Adaptive Semantic Routing, Situation-Aware Hierarchical Reasoning, and Autonomous Self-Reflective Verification, NaviMind transitions ASVs from superficial pattern matching to regulation-compliant, interpretable decision-making. Experimental results demonstrate that our framework significantly transcends existing baselines, establishing a new paradigm for intelligent, trustworthy interaction in dynamic maritime environments.