REALM: A Real-to-Sim Validated Benchmark for Generalization in Robotic Manipulation

๐Ÿ“… 2025-12-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Evaluating and improving the cross-environment generalization of Vision-Language-Action (VLA) models for robotic manipulation remains challenging due to the lack of realistic, real-to-sim-validated benchmarks. Method: We introduce REALM, a high-fidelity simulation benchmark featuring 15 environmental perturbations, 7 manipulation skills, and 3,500+ real-world-aligned objects, enabled by physics-based simulation, cross-domain-consistent control modeling, and a structured task-generation framework. We propose a standardized evaluation protocol covering mainstream VLA modelsโ€”including ฯ€โ‚€, ฯ€โ‚€-FAST, and GR00T N1.5. Contribution/Results: Experiments demonstrate strong correlation (r > 0.92) between simulated performance and real-world behavior, exposing critical robustness deficits in current VLA models. REALM is the first open-source, reproducible, and extensible benchmark dedicated to VLA generalization evaluation, enabling rigorous, scalable assessment of cross-environment transfer capabilities.

Technology Category

Application Category

๐Ÿ“ Abstract
Vision-Language-Action (VLA) models empower robots to understand and execute tasks described by natural language instructions. However, a key challenge lies in their ability to generalize beyond the specific environments and conditions they were trained on, which is presently difficult and expensive to evaluate in the real-world. To address this gap, we present REALM, a new simulation environment and benchmark designed to evaluate the generalization capabilities of VLA models, with a specific emphasis on establishing a strong correlation between simulated and real-world performance through high-fidelity visuals and aligned robot control. Our environment offers a suite of 15 perturbation factors, 7 manipulation skills, and more than 3,500 objects. Finally, we establish two task sets that form our benchmark and evaluate the ฯ€_{0}, ฯ€_{0}-FAST, and GR00T N1.5 VLA models, showing that generalization and robustness remain an open challenge. More broadly, we also show that simulation gives us a valuable proxy for the real-world and allows us to systematically probe for and quantify the weaknesses and failure modes of VLAs. Project page: https://martin-sedlacek.com/realm
Problem

Research questions and friction points this paper is trying to address.

Evaluates generalization of Vision-Language-Action models in robotics
Addresses lack of real-world validated simulation benchmarks
Systematically probes weaknesses in robotic manipulation skills
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulation environment with high-fidelity visuals and control
Benchmark with perturbation factors and manipulation skills
Validated real-to-sim correlation for generalization evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Martin Sedlacek
Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague
P
Pavlo Yefanov
Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague
G
Georgy Ponimatkin
Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague
Jai Bardhan
Jai Bardhan
Predoctoral Research Fellow, TCS Research
Computer VisionDeep LearningRepresentation LearningComputer GraphicsPhysics
S
Simon Pilc
Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague
Mederic Fourmy
Mederic Fourmy
CIIRC CVUT
RoboticsState estimationMPC
Evangelos Kazakos
Evangelos Kazakos
Czech Technical University in Prague
Computer VisionMachine Learning
Cees G. M. Snoek
Cees G. M. Snoek
Professor of Computer Science, University of Amsterdam
Video Understanding:computer visionmultimodal learningmachine learningartificial intelligence
Josef Sivic
Josef Sivic
Czech Technical University, CIIRC, ELLIS Unit Prague
computer visionmachine learning
V
Vladimir Petrik
Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague