Is an object-centric representation beneficial for robotic manipulation ?

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional holistic representations exhibit limited robustness in complex generalization tasks for robotic manipulation involving multiple objects. Method: This work systematically evaluates object-centric representations (OCRs) for downstream embodied manipulation tasks—introducing a highly randomized multi-object simulation environment, integrating classic unsupervised object discovery models into a reinforcement learning framework, and benchmarking against state-of-the-art holistic representation methods across diverse manipulation tasks. Contribution/Results: OCRs significantly improve data efficiency and cross-scenario generalization, demonstrating superior robustness and stability in challenging interactive tasks. These results validate OCRs as a promising foundational representation paradigm for embodied intelligence, advancing the capability of robots to reason about and manipulate multiple objects in dynamic, unstructured environments.

Technology Category

Application Category

📝 Abstract
Object-centric representation (OCR) has recently become a subject of interest in the computer vision community for learning a structured representation of images and videos. It has been several times presented as a potential way to improve data-efficiency and generalization capabilities to learn an agent on downstream tasks. However, most existing work only evaluates such models on scene decomposition, without any notion of reasoning over the learned representation. Robotic manipulation tasks generally involve multi-object environments with potential inter-object interaction. We thus argue that they are a very interesting playground to really evaluate the potential of existing object-centric work. To do so, we create several robotic manipulation tasks in simulated environments involving multiple objects (several distractors, the robot, etc.) and a high-level of randomization (object positions, colors, shapes, background, initial positions, etc.). We then evaluate one classical object-centric method across several generalization scenarios and compare its results against several state-of-the-art hollistic representations. Our results exhibit that existing methods are prone to failure in difficult scenarios involving complex scene structures, whereas object-centric methods help overcome these challenges.
Problem

Research questions and friction points this paper is trying to address.

Evaluates object-centric representation for robotic manipulation tasks
Compares OCR with holistic methods in multi-object environments
Assesses OCR's generalization in randomized, complex scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-centric representation for robotic manipulation
Simulated multi-object environments evaluation
Comparison with holistic representation methods
Alexandre Chapin
Alexandre Chapin
Ecole Centrale Lyon
Intelligence artificielleRobotiqueRepresentation learning
Emmanuel Dellandrea
Emmanuel Dellandrea
Associate Professor, Ecole Centrale de Lyon
Computer VisionMachine LearningRobotics
L
Liming Chen
Ecole Centrale de Lyon, CNRS, INSA Lyon, Université Claude Bernard Lyon 1, Université Lumière Lyon 2, LIRIS, UMR5205, 69130 Ecully, France