Coarse-to-Fine 3D Keyframe Transporter

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Current keyframe imitation learning (IL) methods neglect inherent spatial symmetries in robotic manipulation tasks, resulting in poor sample efficiency and limited generalization. This work identifies and formalizes the *dual equivariance* of keyframe action policies—specifically, equivariance under both workspace translations/rotations and gripper–object relative pose transformations. We propose a coarse-to-fine SE(3) action evaluation mechanism that decouples translation and rotation modeling while preserving joint optimization. Building upon Transporter Networks, we introduce the 3D Keyframe Transporter, which integrates cross-correlation-based feature matching, dual-equivariant feature encoding, and hierarchical SE(3) pose search. Evaluated across multiple simulated manipulation tasks, our method achieves an average performance gain of over 10%; on real-robot experiments, it attains an average improvement of 55%—significantly outperforming state-of-the-art keyframe IL baselines.

Technology Category

Application Category

📝 Abstract

Recent advances in Keyframe Imitation Learning (IL) have enabled learning-based agents to solve a diverse range of manipulation tasks. However, most approaches ignore the rich symmetries in the problem setting and, as a consequence, are sample-inefficient. This work identifies and utilizes the bi-equivariant symmetry within Keyframe IL to design a policy that generalizes to transformations of both the workspace and the objects grasped by the gripper. We make two main contributions: First, we analyze the bi-equivariance properties of the keyframe action scheme and propose a Keyframe Transporter derived from the Transporter Networks, which evaluates actions using cross-correlation between the features of the grasped object and the features of the scene. Second, we propose a computationally efficient coarse-to-fine SE(3) action evaluation scheme for reasoning the intertwined translation and rotation action. The resulting method outperforms strong Keyframe IL baselines by an average of>10% on a wide range of simulation tasks, and by an average of 55% in 4 physical experiments.

Problem

Research questions and friction points this paper is trying to address.

Utilizes bi-equivariant symmetry in Keyframe IL

Proposes a computationally efficient action evaluation

Improves performance in simulation and physical tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-equivariant Keyframe Transporter design

Coarse-to-fine SE(3) action evaluation

Cross-correlation feature analysis

🔎 Similar Papers

Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation