FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-view diffusion models exhibit limited capability in appearance manipulation—particularly regarding material, texture, and style—struggling to achieve geometrically consistent and view-coherent few-shot transfer. This work proposes the first few-shot appearance transfer framework tailored for multi-view diffusion models: it requires no fine-tuning or explicit 3D representations, leveraging only a small set of reference images to jointly encode object identity and appearance cues. Methodologically, we introduce a novel inter-layer self-attention feature backward aggregation mechanism, coupled with triple-path diffusion-based denoising for feature alignment and implicit generative 3D representation-guided cross-view fusion. Extensive experiments across multiple datasets demonstrate significant improvements over state-of-the-art methods. Our approach yields high-fidelity, multi-view-consistent outputs with strong appearance controllability, while maintaining efficient inference and lightweight deployment.

Technology Category

Application Category

📝 Abstract
Multiview diffusion models have rapidly emerged as a powerful tool for content creation with spatial consistency across viewpoints, offering rich visual realism without requiring explicit geometry and appearance representation. However, compared to meshes or radiance fields, existing multiview diffusion models offer limited appearance manipulation, particularly in terms of material, texture, or style. In this paper, we present a lightweight adaptation technique for appearance transfer in multiview diffusion models. Our method learns to combine object identity from an input image with appearance cues rendered in a separate reference image, producing multi-view-consistent output that reflects the desired materials, textures, or styles. This allows explicit specification of appearance parameters at generation time while preserving the underlying object geometry and view coherence. We leverage three diffusion denoising processes responsible for generating the original object, the reference, and the target images, and perform reverse sampling to aggregate a small subset of layer-wise self-attention features from the object and the reference to influence the target generation. Our method requires only a few training examples to introduce appearance awareness to pretrained multiview models. The experiments show that our method provides a simple yet effective way toward multiview generation with diverse appearance, advocating the adoption of implicit generative 3D representations in practice.
Problem

Research questions and friction points this paper is trying to address.

Enables material and texture transfer in multiview diffusion models
Preserves object geometry and view consistency during appearance manipulation
Requires few training examples to adapt pretrained multiview models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight adaptation for multiview appearance transfer
Few-shot self-attention feature aggregation from reference
Preserves geometry and view coherence during generation
🔎 Similar Papers
No similar papers found.