RynnBrain: Open Embodied Foundation Models

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the absence of a unified, physically grounded multimodal foundation model for embodied intelligence, which hinders coherent perception, reasoning, and planning in real-world spatiotemporal dynamics. To this end, we propose the first unified embodied foundation model architecture that integrates four core capabilities: egocentric understanding, multi-scale spatiotemporal localization, physics-grounded reasoning, and physics-aware planning. The model employs a multi-scale Mixture-of-Experts (MoE) structure (2B/8B/30B-A3B) and task-customized post-training strategies, enabling strong performance across diverse downstream tasks—including navigation, vision-language-action (VLA) tasks, and complex spatial reasoning. Evaluated on 20 embodied benchmarks and 8 general visual understanding benchmarks, our model significantly outperforms existing approaches, demonstrating its effectiveness and adaptability as a general-purpose pretrained backbone for embodied AI.

Technology Category

Application Category

📝 Abstract

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal foundation model for embodied intelligence. RynnBrain strengthens four core capabilities in a unified framework: comprehensive egocentric understanding, diverse spatiotemporal localization, physically grounded reasoning, and physics-aware planning. The RynnBrain family comprises three foundation model scales (2B, 8B, and 30B-A3B MoE) and four post-trained variants tailored for downstream embodied tasks (i.e., RynnBrain-Nav, RynnBrain-Plan, and RynnBrain-VLA) or complex spatial reasoning tasks (i.e., RynnBrain-CoP). In terms of extensive evaluations on 20 embodied benchmarks and 8 general vision understanding benchmarks, our RynnBrain foundation models largely outperform existing embodied foundation models by a significant margin. The post-trained model suite further substantiates two key potentials of the RynnBrain foundation model: (i) enabling physically grounded reasoning and planning, and (ii) serving as a strong pretrained backbone that can be efficiently adapted to diverse embodied tasks.

Problem

Research questions and friction points this paper is trying to address.

embodied intelligence

foundation model

spatiotemporal dynamics

physically grounded reasoning

perception-reasoning-planning integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

embodied intelligence

spatiotemporal foundation model

physically grounded reasoning