ORCA: Object Recognition and Comprehension for Archiving Marine Species

📅 2025-12-24

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Marine visual understanding is hindered by scarce annotated data and ill-defined task formulations. To address this, we introduce ORCA—the first multimodal benchmark dedicated to marine species archival—comprising 478 species, 14,647 images, 42,217 detection bounding boxes, and 22,321 expert-verified image-text descriptions. ORCA supports four core tasks: object detection, open-vocabulary recognition, instance-level captioning, and visual grounding. We propose a morphology-guided fine-grained annotation paradigm and establish a unified closed-set/open-vocabulary evaluation framework, exposing key challenges including species diversity and morphological ambiguity. Comprehensive evaluation across 18 state-of-the-art models—including YOLO, GLIP, GIT, and Qwen-VL—reveals severe performance bottlenecks in marine contexts (e.g., detection mAP = 32.1%, open-vocabulary accuracy < 41.5%), underscoring ORCA’s critical role in advancing domain-specific methodology.

Technology Category

Application Category

📝 Abstract

Marine visual understanding is essential for monitoring and protecting marine ecosystems, enabling automatic and scalable biological surveys. However, progress is hindered by limited training data and the lack of a systematic task formulation that aligns domain-specific marine challenges with well-defined computer vision tasks, thereby limiting effective model application. To address this gap, we present ORCA, a multi-modal benchmark for marine research comprising 14,647 images from 478 species, with 42,217 bounding box annotations and 22,321 expert-verified instance captions. The dataset provides fine-grained visual and textual annotations that capture morphology-oriented attributes across diverse marine species. To catalyze methodological advances, we evaluate 18 state-of-the-art models on three tasks: object detection (closed-set and open-vocabulary), instance captioning, and visual grounding. Results highlight key challenges, including species diversity, morphological overlap, and specialized domain demands, underscoring the difficulty of marine understanding. ORCA thus establishes a comprehensive benchmark to advance research in marine domain. Project Page: http://orca.hkustvgd.com/.

Problem

Research questions and friction points this paper is trying to address.

Develops a benchmark for marine species recognition and comprehension

Addresses limited training data and systematic task formulation in marine vision

Evaluates models on detection, captioning, and grounding tasks for marine ecosystems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal benchmark with fine-grained annotations

Evaluates 18 models on three vision tasks

Addresses marine species diversity and domain challenges

🔎 Similar Papers

BenthicNet: A global compilation of seafloor images for deep learning applications