π€ AI Summary
Few-shot keypoint detection suffers from the absence of in-distribution source data for supervision. Method: This paper introduces, for the first time, user-drawn sketches as an unsupervised, source-free supervisory signal. To bridge the cross-modal gap between sketches and images and mitigate individual drawing-style bias, we propose a prototype-based framework integrating a grid-based localizer and a prototype-level domain adaptation mechanism, enabling sketchβimage semantic alignment and precise keypoint regression. Contribution/Results: Our method requires no pre-trained source-domain data and supports rapid generalization to novel categories and unseen keypoints. Extensive experiments on multiple benchmarks demonstrate that it significantly outperforms existing few-shot keypoint detection approaches, achieving strong robustness across diverse sketch styles and high localization accuracy.
π Abstract
Keypoint detection, integral to modern machine perception, faces challenges in few-shot learning, particularly when source data from the same distribution as the query is unavailable. This gap is addressed by leveraging sketches, a popular form of human expression, providing a source-free alternative. However, challenges arise in mastering cross-modal embeddings and handling user-specific sketch styles. Our proposed framework overcomes these hurdles with a prototypical setup, combined with a grid-based locator and prototypical domain adaptation. We also demonstrate success in few-shot convergence across novel keypoints and classes through extensive experiments.