Cosmos World Foundation Model Platform for Physical AI

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Physical AI faces challenges in addressing complex societal problems due to the lack of high-fidelity, customizable world models. Method: This paper proposes the *tunable World Foundation Model (WFM)* paradigm—a unified framework for embodied intelligence that integrates video data curation, WFM pretraining, task-adaptive fine-tuning, and an efficient video tokenizer. It is the first open-source, end-to-end framework enabling seamless transition from general world modeling to task-specific digital twins. Contribution/Results: We release an open-source platform and publicly available model weights, substantially lowering the barrier to physical AI world modeling. Extensive evaluation across diverse robot sim-to-real transfer tasks demonstrates the framework’s effectiveness in rapidly constructing high-fidelity, lightweight, and task-adapted digital twin environments.

Technology Category

Application Category

📝 Abstract
Physical AI needs to be trained digitally first. It needs a digital twin of itself, the policy model, and a digital twin of the world, the world model. In this paper, we present the Cosmos World Foundation Model Platform to help developers build customized world models for their Physical AI setups. We position a world foundation model as a general-purpose world model that can be fine-tuned into customized world models for downstream applications. Our platform covers a video curation pipeline, pre-trained world foundation models, examples of post-training of pre-trained world foundation models, and video tokenizers. To help Physical AI builders solve the most critical problems of our society, we make our platform open-source and our models open-weight with permissive licenses available via https://github.com/NVIDIA/Cosmos.
Problem

Research questions and friction points this paper is trying to address.

Flexible World Model
Physical AI
Societal Issues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cosmos Platform
Physical AI
World Modeling
N
Nvidia Niket Agarwal
Arslan Ali
Arslan Ali
Senior AI Applied Research Scientist, NVIDIA
AIDeepLearningGenerativeAI
M
Maciej Bala
Yogesh Balaji
Yogesh Balaji
Research Scientist at NVIDIA
Machine LearningComputer VisionArtificial Intelligence
E
Erik Barker
T
Tiffany Cai
Prithvijit Chattopadhyay
Prithvijit Chattopadhyay
Research Scientist, NVIDIA Research
Artificial IntelligenceComputer VisionMachine LearningDeep LearningReinforcement Learning
Yongxin Chen
Yongxin Chen
Georgia Institute of Technology
control theorymachine learningroboticsoptimal transportoptimization
Yin Cui
Yin Cui
Research Scientist, NVIDIA
Computer VisionMachine Learning
Y
Yifan Ding
D
Daniel Dworakowski
Jiaojiao Fan
Jiaojiao Fan
NVIDIA
Generative AI
Michele Fenzi
Michele Fenzi
Nvidia
Computer Vision
Francesco Ferroni
Francesco Ferroni
NVIDIA
machine learningdeep learningroboticscomputer visionphysics
Sanja Fidler
Sanja Fidler
University of Toronto, NVIDIA
Computer Vision
Dieter Fox
Dieter Fox
University of Washington and AI2
RoboticsArtificial IntelligenceComputer Vision
Songwei Ge
Songwei Ge
Reve
Machine LearningComputer VisionArtificial Intelligence
Yunhao Ge
Yunhao Ge
Research Scientist, NVIDIA
Deep LearningComputer VisionGenerative AIRobotics
J
Jinwei Gu
Siddharth Gururani
Siddharth Gururani
NVIDIA Research
Artificial IntelligenceMusic Information RetrievalMachine LearningDeep LearningText to Speech
Ethan He
Ethan He
xAI
LLMdeep learningmultimodalcomputer vision
Jiahui Huang
Jiahui Huang
NVIDIA
3D Computer VisionGraphics
J
J. Huffman
Pooya Jannaty
Pooya Jannaty
Brown University
J
Jingyi Jin
Seung Wook Kim
Seung Wook Kim
NVIDIA
Machine learning
G
Gergely Kl'ar
Grace Lam
Grace Lam
NVIDIA
machine learning
Shiyi Lan
Shiyi Lan
NVIDIA
VisionLLM AgentVisual Gen
L
L. Leal-Taixé
A
Anqi Li
Z
Zhaoshuo Li
C
Chen-Hsuan Lin
Tsung-Yi Lin
Tsung-Yi Lin
Research Scientist, NVIDIA
Computer VisionMachine Learning
Huan Ling
Huan Ling
University of Toronto
computer vision
M
Ming-Yu Liu
X
Xian Liu
A
Alice Luo
Q
Qianli Ma
Hanzi Mao
Hanzi Mao
Research Scientist, Nvidia
Deep LearningComputer Vision
Kaichun Mo
Kaichun Mo
Research Scientist at NVIDIA; Previously CS Ph.D. at Stanford
Computer VisionRoboticsComputer Graphics
A
A. Mousavian
Seungjun Nah
Seungjun Nah
NVIDIA
Computer VisionDeep Learning
S
Sriharsha Niverty
D
David Page
Despoina Paschalidou
Despoina Paschalidou
NVIDIA
Computer VisionMachine Learning
Zeeshan Patel
Zeeshan Patel
xAI
Deep LearningGenerative AIComputer Vision
L
Lindsey Pavao
M
Morteza Ramezanali
F
F. Reda
Xiaowei Ren
Xiaowei Ren
Senior Deep Learning Architect, NVIDIA
Computer Architecture
V
Vasanth Rao Naik Sabavat
E
Ed Schmerling
S
Stella Shi
B
Bartosz Stefaniak
S
Shitao Tang
Lyne P. Tchapmi
Lyne P. Tchapmi
Stanford University
P
Przemek Tredak
Wei-Cheng Tseng
Wei-Cheng Tseng
Department of Computer Science, University of Texas at Austin
Deep LearningSelf-supervised LearningSpeech Processing
J
Jibin Varghese
H
Hao Wang
H
Haoxiang Wang
H
Heng Wang
Ting-Chun Wang
Ting-Chun Wang
NVIDIA Research
Computer visionComputer graphics
Fangyin Wei
Fangyin Wei
NVIDIA Research
Computer VisionMachine Learning
Xinyue Wei
Xinyue Wei
Hillbot
Computer GraphicsComputer VisionEmbodied AI
J
Jay Zhangjie Wu
J
Jiashu Xu
W
Wei Yang
Lin Yen-Chen
Lin Yen-Chen
Massachusetts Institute of Technology
RoboticsMachine Learning
X
Xiaohui Zeng
Yuan Zeng
Yuan Zeng
J
Jing Zhang
Qinsheng Zhang
Qinsheng Zhang
Research Scientist, Nvidia
Machine learningRobotics
Y
Yuxuan Zhang
Qingqing Zhao
Qingqing Zhao
Stanford University
Computer VisionComputer GraphicsMachine LearningAI for ScienceNuclear Physics
A
Artur Zolkowski