TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation

📅 2024-10-31
🏛️ Neural Information Processing Systems
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based human image animation methods suffer significant degradation in generation quality when scale or rotational misalignment exists between the reference image and target pose—severely limiting practical applicability. To address this, we propose Test-time Procrustes Calibration (TPC), the first geometric calibration technique integrated into diffusion-based animation frameworks. TPC performs training-free, plug-and-play geometric realignment of the reference image at test time via Singular Value Decomposition (SVD)-based Procrustes analysis, and adaptively adjusts the diffusion model’s conditional inputs accordingly. The method is model-agnostic, incurs zero training overhead, and introduces no additional inference latency. Extensive evaluations across multiple benchmarks demonstrate that TPC reduces FID by 23.6% and improves keypoint consistency by 31.4%, effectively mitigating the pervasive composition misalignment problem in real-world scenarios.

Technology Category

Application Category

📝 Abstract
Human image animation aims to generate a human motion video from the inputs of a reference human image and a target motion video. Current diffusion-based image animation systems exhibit high precision in transferring human identity into targeted motion, yet they still exhibit irregular quality in their outputs. Their optimal precision is achieved only when the physical compositions (i.e., scale and rotation) of the human shapes in the reference image and target pose frame are aligned. In the absence of such alignment, there is a noticeable decline in fidelity and consistency. Especially, in real-world environments, this compositional misalignment commonly occurs, posing significant challenges to the practical usage of current systems. To this end, we propose Test-time Procrustes Calibration (TPC), which enhances the robustness of diffusion-based image animation systems by maintaining optimal performance even when faced with compositional misalignment, effectively addressing real-world scenarios. The TPC provides a calibrated reference image for the diffusion model, enhancing its capability to understand the correspondence between human shapes in the reference and target images. Our method is simple and can be applied to any diffusion-based image animation system in a model-agnostic manner, improving the effectiveness at test time without additional training.
Problem

Research questions and friction points this paper is trying to address.

Addresses irregular output quality in diffusion-based human image animation
Solves compositional misalignment between reference and target motion frames
Enhances robustness for real-world scenarios without additional training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time Procrustes Calibration for robustness
Aligns reference and target human shapes
Model-agnostic, no extra training needed
🔎 Similar Papers
No similar papers found.
Sunjae Yoon
Sunjae Yoon
KAIST
Deep LearningComputer VisionGenerative AI
Gwanhyeong Koo
Gwanhyeong Koo
KAIST
Y
Younghwan Lee
Korea Advanced Institute of Science and Technology (KAIST)
C
C. D. Yoo
Korea Advanced Institute of Science and Technology (KAIST)