ORCA: Object Recognition and Comprehension for Archiving Marine Species

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Marine visual understanding is hindered by scarce annotated data and ill-defined task formulations. To address this, we introduce ORCA—the first multimodal benchmark dedicated to marine species archival—comprising 478 species, 14,647 images, 42,217 detection bounding boxes, and 22,321 expert-verified image-text descriptions. ORCA supports four core tasks: object detection, open-vocabulary recognition, instance-level captioning, and visual grounding. We propose a morphology-guided fine-grained annotation paradigm and establish a unified closed-set/open-vocabulary evaluation framework, exposing key challenges including species diversity and morphological ambiguity. Comprehensive evaluation across 18 state-of-the-art models—including YOLO, GLIP, GIT, and Qwen-VL—reveals severe performance bottlenecks in marine contexts (e.g., detection mAP = 32.1%, open-vocabulary accuracy < 41.5%), underscoring ORCA’s critical role in advancing domain-specific methodology.

Technology Category

Application Category

📝 Abstract
Marine visual understanding is essential for monitoring and protecting marine ecosystems, enabling automatic and scalable biological surveys. However, progress is hindered by limited training data and the lack of a systematic task formulation that aligns domain-specific marine challenges with well-defined computer vision tasks, thereby limiting effective model application. To address this gap, we present ORCA, a multi-modal benchmark for marine research comprising 14,647 images from 478 species, with 42,217 bounding box annotations and 22,321 expert-verified instance captions. The dataset provides fine-grained visual and textual annotations that capture morphology-oriented attributes across diverse marine species. To catalyze methodological advances, we evaluate 18 state-of-the-art models on three tasks: object detection (closed-set and open-vocabulary), instance captioning, and visual grounding. Results highlight key challenges, including species diversity, morphological overlap, and specialized domain demands, underscoring the difficulty of marine understanding. ORCA thus establishes a comprehensive benchmark to advance research in marine domain. Project Page: http://orca.hkustvgd.com/.
Problem

Research questions and friction points this paper is trying to address.

Develops a benchmark for marine species recognition and comprehension
Addresses limited training data and systematic task formulation in marine vision
Evaluates models on detection, captioning, and grounding tasks for marine ecosystems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal benchmark with fine-grained annotations
Evaluates 18 models on three vision tasks
Addresses marine species diversity and domain challenges
🔎 Similar Papers
No similar papers found.
Y
Yuk-Kwan Wong
Hong Kong University of Science and Technology
H
Haixin Liang
Hong Kong University of Science and Technology
Z
Zeyu Ma
University of Electronic Science and Technology of China
Yiwei Chen
Yiwei Chen
Yunnan University, Zhejiang Uinversity
Signal processingDeep learningComputational imagingQuantum machine learning
Ziqiang Zheng
Ziqiang Zheng
Hong Kong University of Science and Technology
computer visiondeep learning
R
Rinaldi Gotama
Indo Ocean Foundation
P
Pascal Sebastian
Indo Ocean Foundation
L
Lauren D. Sparks
Indo Ocean Foundation
Sai-Kit Yeung
Sai-Kit Yeung
Integrative Systems and Design, Hong Kong University of Science and Technology
Computer VisionComputer GraphicsComputational Design