Assessing Vision-Language Models for Perception in Autonomous Underwater Robotic Software

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study addresses the challenge of reliable perception for autonomous underwater vehicles (AUVs) in low-visibility, data-scarce, and high-noise underwater environments. For the first time from a software engineering perspective, it systematically evaluates the performance and uncertainty characteristics of multiple vision-language models (VLMs) on underwater debris detection tasks. Through empirical analysis combining quantitative metrics and uncertainty quantification methods, the work reveals significant differences among VLMs in both detection accuracy and confidence calibration under complex underwater conditions. These findings provide empirically grounded guidance for model selection in AUV perception systems and fill a critical gap in software engineering research on VLM deployment in underwater robotics.

Technology Category

Application Category

📝 Abstract

Autonomous Underwater Robots (AURs) operate in challenging underwater environments, including low visibility and harsh water conditions. Such conditions present challenges for software engineers developing perception modules for the AUR software. To successfully carry out these tasks, deep learning has been incorporated into the AUR software to support its operations. However, the unique challenges of underwater environments pose difficulties for deep learning models, which often rely on labeled data that is scarce and noisy. This may undermine the trustworthiness of AUR software that relies on perception modules. Vision-Language Models (VLMs) offer promising solutions for AUR software as they generalize to unseen objects and remain robust in noisy conditions by inferring information from contextual cues. Despite this potential, their performance and uncertainty in underwater environments remain understudied from a software engineering perspective. Motivated by the needs of an industrial partner in assurance and risk management for maritime systems to assess the potential use of VLMs in this context, we present an empirical evaluation of VLM-based perception modules within the AUR software. We assess their ability to detect underwater trash by computing performance, uncertainty, and their relationship, to enable software engineers to select appropriate VLMs for their AUR software.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

Autonomous Underwater Robots

Perception

Uncertainty

Software Engineering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models

Autonomous Underwater Robots

Uncertainty Quantification