🤖 AI Summary
Embodied robots struggle with 3D scene understanding and human-level task generalization in unstructured environments due to reliance on multimodal observations. Method: This paper proposes a lifelong language-conditioned behavioral cloning framework tailored for real-world scenarios. It introduces the first lifelong behavioral cloning paradigm; designs a skill-sharing semantic rendering and representation distillation module to mitigate 3D representation blind spots; and develops a skill-specific evolutionary planner enabling human-like incremental knowledge embedding in a low-rank latent space. Contribution/Results: Evaluated on a newly established lifelong manipulation benchmark, the method significantly outperforms state-of-the-art approaches. The code, dataset, and visualization results are publicly released, demonstrating strong cross-task sequential adaptability and robustness to continual learning.
📝 Abstract
Relying on multi-modal observations, embodied robots could perform multiple robotic manipulation tasks in unstructured real-world environments. However, most language-conditioned behavior-cloning agents still face existing long-standing challenges, i.e., 3D scene representation and human-level task learning, when adapting into new sequential tasks in practical scenarios. We here investigate these above challenges with NBAgent in embodied robots, a pioneering language-conditioned Never-ending Behavior-cloning Agent. It can continually learn observation knowledge of novel 3D scene semantics and robot manipulation skills from skill-shared and skill-specific attributes, respectively. Specifically, we propose a skill-sharedsemantic rendering module and a skill-shared representation distillation module to effectively learn 3D scene semantics from skill-shared attribute, further tackling 3D scene representation overlooking. Meanwhile, we establish a skill-specific evolving planner to perform manipulation knowledge decoupling, which can continually embed novel skill-specific knowledge like human from latent and low-rank space. Finally, we design a never-ending embodied robot manipulation benchmark, and expensive experiments demonstrate the significant performance of our method. Visual results, code, and dataset are provided at: https://neragent.github.io.