AvatarVTON: 4D Virtual Try-On for Animatable Avatars

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing virtual try-on methods rely on multi-view images or physics-based priors, making it challenging to achieve dynamic interaction and 4D try-on—encompassing free-pose control, novel-view synthesis, and diverse garment customization—under single-view supervision. This paper introduces the first end-to-end, single-view-driven 4D virtual try-on framework. We propose a physics-free nonlinear Gaussian deformation network, coupled with a novel bidirectional optical flow correction strategy, to model temporally consistent and adaptive garment dynamics. Given only a single reference garment image and a target human pose sequence, our method generates highly photorealistic and temporally coherent dynamic try-on results. Extensive evaluations on multiple benchmarks demonstrate significant improvements over state-of-the-art approaches. The framework enables practical applications in AR/VR, digital avatars, and gaming, advancing 4D virtual try-on toward lightweight, general-purpose deployment.

Technology Category

Application Category

📝 Abstract

We propose AvatarVTON, the first 4D virtual try-on framework that generates realistic try-on results from a single in-shop garment image, enabling free pose control, novel-view rendering, and diverse garment choices. Unlike existing methods, AvatarVTON supports dynamic garment interactions under single-view supervision, without relying on multi-view garment captures or physics priors. The framework consists of two key modules: (1) a Reciprocal Flow Rectifier, a prior-free optical-flow correction strategy that stabilizes avatar fitting and ensures temporal coherence; and (2) a Non-Linear Deformer, which decomposes Gaussian maps into view-pose-invariant and view-pose-specific components, enabling adaptive, non-linear garment deformations. To establish a benchmark for 4D virtual try-on, we extend existing baselines with unified modules for fair qualitative and quantitative comparisons. Extensive experiments show that AvatarVTON achieves high fidelity, diversity, and dynamic garment realism, making it well-suited for AR/VR, gaming, and digital-human applications.

Problem

Research questions and friction points this paper is trying to address.

Generates 4D virtual try-on from single garment image

Enables dynamic garment interactions without multi-view captures

Achieves high fidelity and realism for AR/VR applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-view garment image generates 4D try-on

Reciprocal Flow Rectifier ensures temporal coherence

Non-Linear Deformer enables adaptive garment deformations

🔎 Similar Papers

No similar papers found.

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

3D Avatar Research and Development - PICO Perception - San Jose

ByteDance

San Jose

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)