OceanGym: A Benchmark Environment for Underwater Embodied Agents

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Underwater embodied agents face severe challenges in extreme environments—such as low visibility and strong dynamic currents—leading to degraded perception, impaired decision-making, and poor long-horizon task completion. Method: This paper introduces UnderwaterEmbodiedBench, the first high-fidelity, task-driven underwater embodied intelligence benchmark, covering eight realistic marine tasks. We propose a unified multimodal large language model framework that jointly processes optical and sonar modalities, integrating memory mechanisms and a sequential decision-making architecture to unify perception, memory, and reasoning. Contribution/Results: Experiments reveal a substantial performance gap between current state-of-the-art agents and human experts, confirming the benchmark’s high difficulty and diagnostic utility. UnderwaterEmbodiedBench provides a reproducible, scalable, and standardized testbed for evaluating underwater AI algorithms, studying generalization, and transferring capabilities to real-world robotic systems.

Technology Category

Application Category

📝 Abstract
We introduce OceanGym, the first comprehensive benchmark for ocean underwater embodied agents, designed to advance AI in one of the most demanding real-world environments. Unlike terrestrial or aerial domains, underwater settings present extreme perceptual and decision-making challenges, including low visibility, dynamic ocean currents, making effective agent deployment exceptionally difficult. OceanGym encompasses eight realistic task domains and a unified agent framework driven by Multi-modal Large Language Models (MLLMs), which integrates perception, memory, and sequential decision-making. Agents are required to comprehend optical and sonar data, autonomously explore complex environments, and accomplish long-horizon objectives under these harsh conditions. Extensive experiments reveal substantial gaps between state-of-the-art MLLM-driven agents and human experts, highlighting the persistent difficulty of perception, planning, and adaptability in ocean underwater environments. By providing a high-fidelity, rigorously designed platform, OceanGym establishes a testbed for developing robust embodied AI and transferring these capabilities to real-world autonomous ocean underwater vehicles, marking a decisive step toward intelligent agents capable of operating in one of Earth's last unexplored frontiers. The code and data are available at https://github.com/OceanGPT/OceanGym.
Problem

Research questions and friction points this paper is trying to address.

Addresses AI challenges in underwater environments with low visibility
Develops agents for autonomous exploration and long-term tasks
Bridges gaps between AI capabilities and real-world ocean deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal Large Language Models for underwater agents
Unified framework integrating perception and decision-making
High-fidelity benchmark for autonomous ocean vehicle testing
Y
Yida Xue
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
M
Mingjun Mao
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
X
Xiangyuan Ru
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
Y
Yuqi Zhu
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
B
Baochang Ren
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
Shuofei Qiao
Shuofei Qiao
Zhejiang University
AI AgentLarge Language ModelsNatural Language ProcessingKnowledge Graphs
M
Mengru Wang
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
Shumin Deng
Shumin Deng
National University of Singapore
NLPLLM Planning & ReasoningLLM AgentKGIE
X
Xinyu An
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
Ningyu Zhang
Ningyu Zhang
Ph.D. Student, Vanderbilt University
artificial intelligencelearning analyticslearning environments
Y
Ying Chen
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University
H
Huajun Chen
Zhejiang University, State Key Laboratory of Ocean Sensing, Zhejiang University