🤖 AI Summary
This work addresses the challenge of efficiently locating sparse biological targets—such as specific coral species—in energy-constrained autonomous underwater vehicle (AUV) missions within visually sparse environments like coral reefs. The study introduces a novel approach that leverages environmental visual context, including co-occurring habitat features, as a guiding signal to drive adaptive search when direct target observations are absent. By integrating DINOv2 embeddings for patch-level one-shot online detection, the method simultaneously identifies both targets and their contextual cues, enabling dynamic path planning informed by real-time scene understanding. Experiments on real AUV imagery demonstrate that the proposed strategy discovers up to 75% of sparse targets in approximately half the time required by exhaustive coverage approaches, significantly outperforming baseline methods that rely solely on direct target detection.
📝 Abstract
Autonomous underwater vehicles (AUVs) are increasingly used to survey coral reefs, yet efficiently locating specific coral species of interest remains difficult: target species are often sparsely distributed across the reef, and an AUV with limited battery life cannot afford to search everywhere. When detections of the target itself are too sparse to provide directional guidance, the robot benefits from an additional signal to decide where to look next. We propose using the visual environmental context -- the habitat features that tend to co-occur with a target species -- as that signal. Because context features are spatially denser and often vary more smoothly than target detections, we hypothesize that a reward function targeted at broader environmental context will enable adaptive planners to make better decisions on where to go next, even in regions where no target has yet been observed. Starting from a single labeled image, our method uses patch-level DINOv2 embeddings to perform one-shot detections of both the target species and its surrounding context online. We validate our approach using real imagery collected by an AUV at two reef sites in St. John, U.S. Virgin Islands, simulating the robot's motion offline. Our results demonstrate that one-shot detection combined with adaptive context modeling enables efficient autonomous surveying, sampling up to 75$\%$ of the target in roughly half the time required by exhaustive coverage when the target is sparsely distributed, and outperforming search strategies that only use target detections.