π€ AI Summary
Addressing the challenges of sequential decision-making, stringent physical constraints, and high perceptual uncertainty in long-horizon robotic manipulation tasks, this paper introduces RoboSeekβa novel embodied manipulation framework driven by interactive experience. Methodologically, RoboSeek establishes a real-to-sim-to-real transfer pipeline: multi-view 3D reconstruction generates photorealistic and physically consistent simulation environments, enabling efficient policy transfer from simulation to reality; it further integrates reinforcement learning with cross-entropy optimization, incorporating visual priors to enhance policy generalization. Evaluated on eight complex long-horizon manipulation tasks, RoboSeek achieves a mean success rate of 79%, substantially outperforming existing baselines. This demonstrates its robustness in dynamic real-world environments and strong cross-task deployability.
π Abstract
Optimizing and refining action execution through
exploration and interaction is a promising way for robotic
manipulation. However, practical approaches to interaction driven robotic learning are still underexplored, particularly for
long-horizon tasks where sequential decision-making, physical
constraints, and perceptual uncertainties pose significant chal lenges. Motivated by embodied cognition theory, we propose
RoboSeek, a framework for embodied action execution that
leverages interactive experience to accomplish manipulation
tasks. RoboSeek optimizes prior knowledge from high-level
perception models through closed-loop training in simulation
and achieves robust real-world execution via a real2sim2real
transfer pipeline. Specifically, we first replicate real-world
environments in simulation using 3D reconstruction to provide
visually and physically consistent environments., then we train
policies in simulation using reinforcement learning and the
cross-entropy method leveraging visual priors. The learned
policies are subsequently deployed on real robotic platforms
for execution. RoboSeek is hardware-agnostic and is evaluated
on multiple robotic platforms across eight long-horizon ma nipulation tasks involving sequential interactions, tool use, and
object handling. Our approach achieves an average success rate
of 79%, significantly outperforming baselines whose success
rates remain below 50%, highlighting its generalization and
robustness across tasks and platforms. Experimental results
validate the effectiveness of our training framework in complex,
dynamic real-world settings and demonstrate the stability of the
proposed real2sim2real transfer mechanism, paving the way for
more generalizable embodied robotic learning. Project Page:
https://russderrick.github.io/Roboseek/