Pelican-VL 1.0: A Foundation Brain Model for Embodied Intelligence

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Embodied intelligence faces challenges in environment perception and adaptive decision-making. Method: This work introduces an open-source multimodal “foundation brain model” architecture, scalable from 7B to 72B parameters and deployable across diverse physical embodiments. To address perceptual and behavioral adaptation, we propose DPPO (Deep-Reflection Proficient Policy Optimization), a meta-cyclic training framework integrating Reinforcement Learning, Refinement, Diagnostic Feedback, and Supervised Fine-Tuning—mimicking human metacognition for efficient deliberate practice. Trained on >4B high-quality tokens using an A800 cluster, the method closes the RL loop with diagnostic feedback and SFT. Contribution/Results: Our approach achieves a 20.3% improvement over base models, outperforms open-source models exceeding 100B parameters by 10.6%, and attains state-of-the-art performance on major embodied AI benchmarks—matching or exceeding closed-source SOTA systems.

Technology Category

Application Category

📝 Abstract
This report presents Pelican-VL 1.0, a new family of open-source embodied brain models with parameter scales ranging from 7 billion to 72 billion. Our explicit mission is clearly stated as: To embed powerful intelligence into various embodiments. Pelican-VL 1.0 is currently the largest-scale open-source embodied multimodal brain model. Its core advantage lies in the in-depth integration of data power and intelligent adaptive learning mechanisms. Specifically, metaloop distilled a high-quality dataset from a raw dataset containing 4+ billion tokens. Pelican-VL 1.0 is trained on a large-scale cluster of 1000+ A800 GPUs, consuming over 50k+ A800 GPU-hours per checkpoint. This translates to a 20.3% performance uplift from its base model and outperforms 100B-level open-source counterparts by 10.6%, placing it on par with leading proprietary systems on well-known embodied benchmarks. We establish a novel framework, DPPO (Deliberate Practice Policy Optimization), inspired by human metacognition to train Pelican-VL 1.0. We operationalize this as a metaloop that teaches the AI to practice deliberately, which is a RL-Refine-Diagnose-SFT loop.
Problem

Research questions and friction points this paper is trying to address.

Developing open-source embodied brain models for intelligent systems
Integrating data power with adaptive learning for enhanced performance
Creating a metacognition-inspired training framework for AI improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source embodied brain models with 7B-72B parameters
Metacognition-inspired DPPO framework for deliberate practice
Metaloop RL-Refine-Diagnose-SFT training with 4B+ token dataset
Y
Yi Zhang
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
Che Liu
Che Liu
Imperial College London
Multimodal learningAI4Medicine
X
Xiancong Ren
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
H
Hanchu Ni
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
S
Shuai Zhang
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
Z
Zeyuan Ding
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
J
Jiayu Hu
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
H
Hanzhe Shan
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
Zhenwei Niu
Zhenwei Niu
Khalifa University
Physical human-robot interactionVariable stiffness actuationRobot collision detectionRobot learning
Zhaoyang Liu
Zhaoyang Liu
Tongyi Lab, Alibaba Group
LLMRecommendation
Y
Yue Zhao
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
J
Junbo Qi
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
Q
Qinfan Zhang
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
D
Dengjie Li
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
Y
Yidong Wang
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
Jiachen Luo
Jiachen Luo
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
Y
Yong Dai
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
J
Jian Tang
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)
X
Xiaozhu Ju
WFM System Group, Beijing Innovation Center of Humanoid Robotics (X-Humanoid)