MimiCAT: Mimic with Correspondence-Aware Cascade-Transformer for Category-Free 3D Pose Transfer

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D pose transfer methods heavily rely on structural similarity between source and target characters, limiting generalization to cross-category scenarios (e.g., humanoid → quadruped) and causing region misalignment and distortion. This work proposes the first category-agnostic, cross-structural 3D pose transfer framework. First, we introduce a semantic keypoint-driven soft correspondence matching mechanism enabling many-to-many regional alignment. Second, we construct a large-scale, million-sample cross-category pose dataset. Third, we design a cascaded Transformer architecture that jointly integrates soft correspondence projection and shape-conditioned representation, formulating pose transfer as a conditional generation task. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches in both qualitative and quantitative evaluations, achieving high-fidelity, natural pose transfer across species and skeletal structures.

Technology Category

Application Category

📝 Abstract
3D pose transfer aims to transfer the pose-style of a source mesh to a target character while preserving both the target's geometry and the source's pose characteristic. Existing methods are largely restricted to characters with similar structures and fail to generalize to category-free settings (e.g., transferring a humanoid's pose to a quadruped). The key challenge lies in the structural and transformation diversity inherent in distinct character types, which often leads to mismatched regions and poor transfer quality. To address these issues, we first construct a million-scale pose dataset across hundreds of distinct characters. We further propose MimiCAT, a cascade-transformer model designed for category-free 3D pose transfer. Instead of relying on strict one-to-one correspondence mappings, MimiCAT leverages semantic keypoint labels to learn a novel soft correspondence that enables flexible many-to-many matching across characters. The pose transfer is then formulated as a conditional generation process, in which the source transformations are first projected onto the target through soft correspondence matching and subsequently refined using shape-conditioned representations. Extensive qualitative and quantitative experiments demonstrate that MimiCAT transfers plausible poses across different characters, significantly outperforming prior methods that are limited to narrow category transfer (e.g., humanoid-to-humanoid).
Problem

Research questions and friction points this paper is trying to address.

Addressing structural diversity challenges in category-free 3D pose transfer
Overcoming mismatched regions during pose transfer between different character types
Enabling flexible many-to-many correspondence across structurally diverse characters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascade-transformer model for category-free 3D pose transfer
Learns soft correspondence using semantic keypoint labels
Projects source transformations through many-to-many matching
🔎 Similar Papers
No similar papers found.
Z
Zenghao Chai
School of Computing, National University of Singapore
C
Chen Tang
MMLab, The Chinese University of Hong Kong
Y
Yongkang Wong
School of Computing, National University of Singapore
Xulei Yang
Xulei Yang
Principal Scientist & Group Leader, A*STAR, Singapore
3D VisionArtificial IntelligenceMedical Imaging
M
Mohan Kankanhalli
School of Computing, National University of Singapore