EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the challenge of enabling mobile robots to perform non-prehensile multi-object rearrangement in cluttered environments using only egocentric vision. The proposed EgoPush framework eschews brittle global state estimation by encoding object-centric relative spatial relationships in a latent space and introduces a novel teacher policy that relies solely on visual cues to induce active perception behaviors. By integrating teacher-student policy distillation with a staged local reward mechanism, EgoPush effectively mitigates long-horizon credit assignment difficulties. Operating exclusively on monocular visual input, the method substantially outperforms end-to-end reinforcement learning baselines in simulation and achieves zero-shot sim-to-real transfer, demonstrating the efficacy and generalization capability of its core components.

Technology Category

Application Category

📝 Abstract

Humans can rearrange objects in cluttered environments using egocentric perception, navigating occlusions without global coordinates. Inspired by this capability, we study long-horizon multi-object non-prehensile rearrangement for mobile robots using a single egocentric camera. We introduce EgoPush, a policy learning framework that enables egocentric, perception-driven rearrangement without relying on explicit global state estimation that often fails in dynamic scenes. EgoPush designs an object-centric latent space to encode relative spatial relations among objects, rather than absolute poses. This design enables a privileged reinforcement-learning (RL) teacher to jointly learn latent states and mobile actions from sparse keypoints, which is then distilled into a purely visual student policy. To reduce the supervision gap between the omniscient teacher and the partially observed student, we restrict the teacher's observations to visually accessible cues. This induces active perception behaviors that are recoverable from the student's viewpoint. To address long-horizon credit assignment, we decompose rearrangement into stage-level subproblems using temporally decayed, stage-local completion rewards. Extensive simulation experiments demonstrate that EgoPush significantly outperforms end-to-end RL baselines in success rate, with ablation studies validating each design choice. We further demonstrate zero-shot sim-to-real transfer on a mobile platform in the real world. Code and videos are available at https://ai4ce.github.io/EgoPush/.

Problem

Research questions and friction points this paper is trying to address.

egocentric perception

multi-object rearrangement

mobile robots

non-prehensile manipulation

long-horizon tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric perception

object-centric latent space

privileged reinforcement learning