🤖 AI Summary
Existing unsupervised keypoint detection methods rely on image reconstruction, lack depth awareness, and suffer from spurious keypoint detections in background regions. To address these limitations, we propose Distill-DKP—a novel cross-modal knowledge distillation framework that transfers supervision from depth maps (teacher) to RGB images (student). Distill-DKP introduces the first embedding-level depth semantic transfer mechanism with explicit background suppression. It integrates depth map encoding, RGB-Depth dual-stream feature alignment, and embedding-level loss constraints. Evaluated on Human3.6M, Distill-DKP reduces mean L2 error by 47.15%; on Taichi, it lowers average error by 5.67%; and on DeepFashion, it improves keypoint accuracy by 1.3%. Crucially, inference requires only the student RGB model—no depth labels or manual annotations are needed. This work establishes a new paradigm for depth-aware unsupervised keypoint learning via cross-modal distillation at the embedding level.
📝 Abstract
Existing unsupervised keypoint detection methods apply artificial deformations to images such as masking a significant portion of images and using reconstruction of original image as a learning objective to detect keypoints. However, this approach lacks depth information in the image and often detects keypoints on the background. To address this, we propose Distill-DKP, a novel cross-modal knowledge distillation framework that leverages depth maps and RGB images for keypoint detection in a self-supervised setting. During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model with inference restricted to the student. Experiments show that Distill-DKP significantly outperforms previous unsupervised methods by reducing mean L2 error by 47.15% on Human3.6M, mean average error by 5.67% on Taichi, and improving keypoints accuracy by 1.3% on DeepFashion dataset. Detailed ablation studies demonstrate the sensitivity of knowledge distillation across different layers of the network. Project Page: https://23wm13.github.io/distill-dkp/