OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current embodied AI systems face two key bottlenecks: (1) poor geometric adaptability—2D vision models lack intrinsic 3D spatial understanding, while hard-coded 3D priors hinder generalization; and (2) absence of embodiment constraints—ignoring robotic physical limitations leads to inexecutable plans. This paper introduces OmniEVA, the first general-purpose embodied planning framework unifying task-adaptive 3D spatial grounding with embodiment-aware reasoning. Its core innovations are: (i) a gated-routing mechanism for dynamic 3D information fusion, selectively injecting geometric priors conditioned on task requirements; and (ii) explicit integration of robot embodiment constraints—including kinematics, physical dimensions, and actuation capabilities—into the multimodal large language model’s reasoning process. Evaluated across diverse foundational and compositional embodied tasks, OmniEVA achieves state-of-the-art performance, significantly improving cross-scenario generalization and plan feasibility.

Technology Category

Application Category

📝 Abstract
Recent advances in multimodal large language models (MLLMs) have opened new opportunities for embodied intelligence, enabling multimodal understanding, reasoning, and interaction, as well as continuous spatial decision-making. Nevertheless, current MLLM-based embodied systems face two critical limitations. First, Geometric Adaptability Gap: models trained solely on 2D inputs or with hard-coded 3D geometry injection suffer from either insufficient spatial information or restricted 2D generalization, leading to poor adaptability across tasks with diverse spatial demands. Second, Embodiment Constraint Gap: prior work often neglects the physical constraints and capacities of real robots, resulting in task plans that are theoretically valid but practically infeasible.To address these gaps, we introduce OmniEVA -- an embodied versatile planner that enables advanced embodied reasoning and task planning through two pivotal innovations: (1) a Task-Adaptive 3D Grounding mechanism, which introduces a gated router to perform explicit selective regulation of 3D fusion based on contextual requirements, enabling context-aware 3D grounding for diverse embodied tasks. (2) an Embodiment-Aware Reasoning framework that jointly incorporates task goals and embodiment constraints into the reasoning loop, resulting in planning decisions that are both goal-directed and executable. Extensive experimental results demonstrate that OmniEVA not only achieves state-of-the-art general embodied reasoning performance, but also exhibits a strong ability across a wide range of downstream scenarios. Evaluations of a suite of proposed embodied benchmarks, including both primitive and composite tasks, confirm its robust and versatile planning capabilities. Project page: https://omnieva.github.io
Problem

Research questions and friction points this paper is trying to address.

Addressing geometric adaptability gap in 3D spatial reasoning
Overcoming embodiment constraint gap for real robot feasibility
Enabling versatile task planning across diverse spatial scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-adaptive 3D grounding with gated router
Embodiment-aware reasoning with physical constraints
Selective 3D fusion for context-aware planning
🔎 Similar Papers
No similar papers found.
Y
Yuecheng Liu
Huawei Noah’s Ark Lab
D
Dafeng Chi
Huawei Noah’s Ark Lab
S
Shiguang Wu
Huawei Noah’s Ark Lab
Z
Zhanguang Zhang
Huawei Noah’s Ark Lab
Yuzheng Zhuang
Yuzheng Zhuang
Senior Researcher @ Huawei Noah's Ark Lab
Reinforcement LearningOptimizationAutonomous DrivingCommunication
B
Bowen Yang
Huawei Noah’s Ark Lab
H
He Zhu
Huawei Noah’s Ark Lab
Lingfeng Zhang
Lingfeng Zhang
PhD student at Tsinghua University
embodied ai
P
Pengwei Xie
Huawei Noah’s Ark Lab
D
David Gamaliel Arcos Bravo
Huawei Noah’s Ark Lab
Y
Yingxue Zhang
Huawei Noah’s Ark Lab
Jianye Hao
Jianye Hao
Huawei Noah's Ark Lab/Tianjin University
Multiagent SystemsEmbodied AI
X
Xingyue Quan
Huawei Noah’s Ark Lab