WaterVideoQA: ASV-Centric Perception and Rule-Compliant Reasoning via Multi-Modal Agents

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited knowledge-driven, interactive environmental cognition of existing autonomous surface vessels (ASVs) in complex waterways, which hinders their ability to translate visual perception into high-level navigational decisions compliant with maritime rules. To bridge this gap, the authors introduce WaterVideoQA—the first large-scale video question-answering benchmark tailored for diverse aquatic scenarios—and propose NaviMind, a multi-agent neuro-symbolic system integrating adaptive semantic routing, context-aware hierarchical reasoning, and self-reflective validation mechanisms. The proposed approach substantially outperforms current baselines, enabling explainable, trustworthy, and regulation-compliant decision-making in dynamic maritime environments. This work advances ASVs beyond pattern recognition toward a new paradigm of rule-abiding cognitive reasoning.

Technology Category

Application Category

📝 Abstract
While autonomous navigation has achieved remarkable success in passive perception (e.g., object detection and segmentation), it remains fundamentally constrained by a void in knowledge-driven, interactive environmental cognition. In the high-stakes domain of maritime navigation, the ability to bridge the gap between raw visual perception and complex cognitive reasoning is not merely an enhancement but a critical prerequisite for Autonomous Surface Vessels to execute safe and precise maneuvers. To this end, we present WaterVideoQA, the first large-scale, comprehensive Video Question Answering benchmark specifically engineered for all-waterway environments. This benchmark encompasses 3,029 video clips across six distinct waterway categories, integrating multifaceted variables such as volatile lighting and dynamic weather to rigorously stress-test ASV capabilities across a five-tier hierarchical cognitive framework. Furthermore, we introduce NaviMind, a pioneering multi-agent neuro-symbolic system designed for open-ended maritime reasoning. By synergizing Adaptive Semantic Routing, Situation-Aware Hierarchical Reasoning, and Autonomous Self-Reflective Verification, NaviMind transitions ASVs from superficial pattern matching to regulation-compliant, interpretable decision-making. Experimental results demonstrate that our framework significantly transcends existing baselines, establishing a new paradigm for intelligent, trustworthy interaction in dynamic maritime environments.
Problem

Research questions and friction points this paper is trying to address.

Autonomous Surface Vessels
Video Question Answering
Cognitive Reasoning
Maritime Navigation
Environmental Cognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

WaterVideoQA
multi-agent neuro-symbolic system
Autonomous Surface Vessels
hierarchical cognitive reasoning
rule-compliant decision-making
🔎 Similar Papers
No similar papers found.
Runwei Guan
Runwei Guan
Hong Kong University of Science and Technology (Guangzhou) / Founder of FertiTech AI
Multi-Modal LearningUnmanned Surface VesselRadar PerceptionAI Medicine
S
Shaofeng Liang
Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
N
Ningwei Ouyang
School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, China
W
Weichen Fei
School of Artificial Intelligence, Nanjing University, Nanjing, China
Shanliang Yao
Shanliang Yao
Yancheng Institute of Technology
Autonomous DrivingIntelligent VehiclesRadar-Camera FusionMaritime Perception
Wei Dai
Wei Dai
Civil Aviation University of China
Air Traffic ManagementUAS Traffic ManagementUrban Air Mobility
C
Chenhao Ge
School of Engineering, Stanford University, Palo Alto, USA
P
Penglei Sun
Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Xiaohui Zhu
Xiaohui Zhu
Xi'an Jiaotong-Liverpool University
Autonomous navigationRoboticsAI applicationsEnvironment monitoringAIoT
T
Tao Huang
Centre for AI and Data Science Innovation and the School of Science and Engineering, James Cook University, Smithfield, Australia
R
Ryan Wen Liu
Hubei Key Laboratory of Inland Shipping Technology (Wuhan University of Technology), Wuhan, China; School of Navigation, Wuhan University of Technology, Wuhan, China
Hui Xiong
Hui Xiong
Senior Scientist, Candela Corporation
Ultrafast dynamicsatomic molecular physicsfree electron laser