🤖 AI Summary
To address the challenges of monitoring large-scale mortality events in temperature-sensitive marine organisms—such as oysters—under ocean warming and acidification, which are hindered by high cost, operational risk, and limited scalability, this paper proposes an intelligent autonomous system for long-term, wide-area seabed monitoring. We introduce a domain-aware reasoning framework that integrates vision-language models (VLMs) to enable zero-shot semantic understanding, environment-adaptive navigation, and goal-driven decision-making. Crucially, our architecture tightly couples semantic perception with path planning, enabling fully autonomous, uninterrupted operation. Experimental results demonstrate that, in oyster monitoring tasks, our method reduces execution time by 31.5% and improves target coverage by 8.88% over baseline approaches. In shipwreck detection, it decreases motion steps by 27.5% while achieving 100% environmental coverage. These advances significantly enhance the efficiency, robustness, and scalability of underwater monitoring systems.
📝 Abstract
The ocean is warming and acidifying, increasing the risk of mass mortality events for temperature-sensitive shellfish such as oysters. This motivates the development of long-term monitoring systems. However, human labor is costly and long-duration underwater work is highly hazardous, thus favoring robotic solutions as a safer and more efficient option. To enable underwater robots to make real-time, environment-aware decisions without human intervention, we must equip them with an intelligent "brain." This highlights the need for persistent,wide-area, and low-cost benthic monitoring. To this end, we present DREAM, a Vision Language Model (VLM)-guided autonomy framework for long-term underwater exploration and habitat monitoring. The results show that our framework is highly efficient in finding and exploring target objects (e.g., oysters, shipwrecks) without prior location information. In the oyster-monitoring task, our framework takes 31.5% less time than the previous baseline with the same amount of oysters. Compared to the vanilla VLM, it uses 23% fewer steps while covering 8.88% more oysters. In shipwreck scenes, our framework successfully explores and maps the wreck without collisions, requiring 27.5% fewer steps than the vanilla model and achieving 100% coverage, while the vanilla model achieves 60.23% average coverage in our shipwreck environments.