🤖 AI Summary
This work addresses the catastrophic forgetting of hand-eye calibration models in open-world robotic manipulation caused by dynamic scene changes. To tackle this challenge, the authors propose a continual hand-eye calibration framework that integrates a Spatial-Aware Replay Strategy (SARS) with Structure-Preserving Dual-Branch Distillation (SPDD). The approach constructs a replay buffer via geometrically uniform sampling and separately preserves knowledge of scene layout and pose accuracy. Experimental results on multiple public datasets demonstrate that the proposed framework effectively adapts to new environments while significantly retaining calibration performance on previously encountered scenes, thereby substantially mitigating forgetting at both task and sample levels and outperforming existing continual learning methods.
📝 Abstract
Hand-eye calibration through visual localization is a critical capability for robotic manipulation in open-world environments. However, most deep learning-based calibration models suffer from catastrophic forgetting when adapting into unseen data amongst open-world scene changes, while simple rehearsal-based continual learning strategy cannot well mitigate this issue. To overcome this challenge, we propose a continual hand-eye calibration framework, enabling robots to adapt to sequentially encountered open-world manipulation scenes through spatially replay strategy and structure-preserving distillation. Specifically, a Spatial-Aware Replay Strategy (SARS) constructs a geometrically uniform replay buffer that ensures comprehensive coverage of each scene pose space, replacing redundant adjacent frames with maximally informative viewpoints. Meanwhile, a Structure-Preserving Dual Distillation (SPDD) is proposed to decompose localization knowledge into coarse scene layout and fine pose precision, and distills them separately to alleviate both types of forgetting during continual adaptation. As a new manipulation scene arrives, SARS provides geometrically representative replay samples from all prior scenes, and SPDD applies structured distillation on these samples to retain previously learned knowledge. After training on the new scene, SARS incorporates selected samples from the new scene into the replay buffer for future rehearsal, allowing the model to continuously accumulate multi-scene calibration capability. Experiments on multiple public datasets show significant anti scene forgetting performance, maintaining accuracy on past scenes while preserving adaptation to new scenes, confirming the effectiveness of the framework.