CORAL: COntextual Reasoning And Local Planning in A Hierarchical VLM Framework for Underwater Monitoring

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing underwater autonomous monitoring approaches rely on geometric navigation and lack semantic understanding, while end-to-end vision-language model (VLM) systems suffer from high inference latency, neglect of vehicle dynamics constraints, and cumulative path errors. This work proposes CORAL, a hierarchical framework that decouples high-level semantic reasoning from low-level reactive control: the VLM is solely responsible for selecting semantic waypoints, while a dynamics-aware planner generates safe local trajectories. A geometric verification mechanism ensures waypoint feasibility and triggers replanning when necessary. This approach reduces VLM invocation frequency by 57%, achieves zero collisions, and improves monitoring coverage by 17.85% relative (14.28 percentage points absolute), marking the first effective integration of semantic exploration with dynamics-safe control in underwater autonomous systems.

Technology Category

Application Category

📝 Abstract
Oyster reefs are critical ecosystem species that sustain biodiversity, filter water, and protect coastlines, yet they continue to decline globally. Restoring these ecosystems requires regular underwater monitoring to assess reef health, a task that remains costly, hazardous, and limited when performed by human divers. Autonomous underwater vehicles (AUVs) offer a promising alternative, but existing AUVs rely on geometry-based navigation that cannot interpret scene semantics. Recent vision-language models (VLMs) enable semantic reasoning for intelligent exploration, but existing VLM-driven systems adopt an end-to-end paradigm, introducing three key limitations. First, these systems require the VLM to generate every navigation decision, forcing frequent waits for inference. Second, VLMs cannot model robot dynamics, causing collisions in cluttered environments. Third, limited self-correction allows small deviations to accumulate into large path errors. To address these limitations, we propose CORAL, a framework that decouples high-level semantic reasoning from low-level reactive control. The VLM provides high-level exploration guidance by selecting waypoints, while a dynamics-based planner handles low-level collision-free execution. A geometric verification module validates waypoints and triggers replanning when needed. Compared with the previous state-of-the-art, CORAL improves coverage by 14.28% percentage points, or 17.85% relatively, reduces collisions by 100%, and requires 57% fewer VLM calls.
Problem

Research questions and friction points this paper is trying to address.

autonomous underwater vehicles
vision-language models
semantic reasoning
navigation
underwater monitoring
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical VLM
Semantic reasoning
Local planning
Geometric verification
Autonomous underwater monitoring
🔎 Similar Papers
No similar papers found.
Z
Zhenqi Wu
University of South Florida
Y
Yuanjie Lu
George Mason University
X
Xuesu Xiao
George Mason University
Xiaomin Lin
Xiaomin Lin
Assistant Prof, University of South Florida
AI for goodRobotics for scienceRobotics for good