PhyCustom: Towards Realistic Physical Customization in Text-to-Image Generation

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-image diffusion models excel at customizing abstract concepts such as style and shape but struggle to accurately model real-world physical attributes—e.g., elasticity, fluidity, and rigidity—due to the absence of explicit physical priors during training. To address this, we propose PhysDiff, a physics-knowledge-enhanced diffusion fine-tuning framework. PhysDiff introduces two novel regularization losses: (i) an isometric loss that enforces alignment between latent-space variations of physical attributes and geometric/dynamic constraints, and (ii) a disentanglement loss that suppresses entanglement between physical concepts and irrelevant factors (e.g., style or shape). We perform fine-grained adaptation of Stable Diffusion on a multi-source physical image dataset. Experiments demonstrate that PhysDiff significantly improves controllability over physical attributes, outperforming state-of-the-art methods both quantitatively (achieving +23.6% improvement in physical consistency) and qualitatively in generation fidelity.

Technology Category

Application Category

📝 Abstract
Recent diffusion-based text-to-image customization methods have achieved significant success in understanding concrete concepts to control generation processes, such as styles and shapes. However, few efforts dive into the realistic yet challenging customization of physical concepts. The core limitation of current methods arises from the absence of explicitly introducing physical knowledge during training. Even when physics-related words appear in the input text prompts, our experiments consistently demonstrate that these methods fail to accurately reflect the corresponding physical properties in the generated results. In this paper, we propose PhyCustom, a fine-tuning framework comprising two novel regularization losses to activate diffusion model to perform physical customization. Specifically, the proposed isometric loss aims at activating diffusion models to learn physical concepts while decouple loss helps to eliminate the mixture learning of independent concepts. Experiments are conducted on a diverse dataset and our benchmark results demonstrate that PhyCustom outperforms previous state-of-the-art and popular methods in terms of physical customization quantitatively and qualitatively.
Problem

Research questions and friction points this paper is trying to address.

Enables realistic physical property customization in text-to-image generation
Introduces physics knowledge to diffusion models for accurate concept learning
Decouples independent physical concepts to improve generation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces two novel regularization losses for fine-tuning
Activates diffusion models to learn physical concepts
Decouples independent concepts to improve customization accuracy
🔎 Similar Papers
No similar papers found.
F
Fan Wu
Nanyang Technological University
C
Cheng Chen
Nanyang Technological University
Zhoujie Fu
Zhoujie Fu
Nanyang Technology University
Computer Vision
J
Jiacheng Wei
Nanyang Technological University
Y
Yi Xu
Goertek Alpha Labs
Deheng Ye
Deheng Ye
Director of AI, Tencent
Applied machine learning
Guosheng Lin
Guosheng Lin
Nanyang Technological University
Computer VisionMachine Learning