A training-free framework for high-fidelity appearance transfer via diffusion transformers

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the challenge that existing diffusion Transformers (DiTs) often disrupt scene structure due to global self-attention, hindering high-fidelity and controllable reference-based appearance transfer without retraining. The authors propose the first training-free DiT framework for appearance transfer, which achieves precise fine-grained texture migration while preserving geometric structure through three key components: disentanglement of structure and appearance features, high-fidelity inverse mapping, and a geometry-prior-guided dynamic attention sharing mechanism. Evaluated at 1024px resolution, the method attains state-of-the-art performance, outperforming specialized models in both semantic attribute and material transfer tasks, and significantly enhances structural consistency and appearance fidelity.

Technology Category

Application Category

📝 Abstract

Diffusion Transformers (DiTs) excel at generation, but their global self-attention makes controllable, reference-image-based editing a distinct challenge. Unlike U-Nets, naively injecting local appearance into a DiT can disrupt its holistic scene structure. We address this by proposing the first training-free framework specifically designed to tame DiTs for high-fidelity appearance transfer. Our core is a synergistic system that disentangles structure and appearance. We leverage high-fidelity inversion to establish a rich content prior for the source image, capturing its lighting and micro-textures. A novel attention-sharing mechanism then dynamically fuses purified appearance features from a reference, guided by geometric priors. Our unified approach operates at 1024px and outperforms specialized methods on tasks ranging from semantic attribute transfer to fine-grained material application. Extensive experiments confirm our state-of-the-art performance in both structural preservation and appearance fidelity.

Problem

Research questions and friction points this paper is trying to address.

appearance transfer

Diffusion Transformers

training-free

structural preservation

high-fidelity generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free

diffusion transformers

appearance transfer