SAFT: Shape and Appearance of Fabrics from Template via Differentiable Physical Simulations from Monocular Video

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the depth ambiguity challenge in joint 3D geometric and appearance reconstruction of dynamic cloth from monocular RGB video. We propose an end-to-end optimization framework that tightly integrates differentiable physics-based simulation with differentiable rendering. Our method employs a parametric cloth template and introduces two novel regularizers: (1) a physics-consistency term enforcing adherence to cloth dynamics equations, and (2) a multi-scale appearance-consistency term enhancing robustness in texture detail recovery. By jointly optimizing geometry and appearance, the approach significantly mitigates monocular depth ambiguity. On standard benchmarks, our method reduces 3D reconstruction error by 2.64× over state-of-the-art methods, with an average runtime of 30 minutes per scene. It successfully recovers fine-grained dynamic wrinkles and photorealistic textures, demonstrating superior fidelity in both structural and visual domains.

Technology Category

Application Category

📝 Abstract
The reconstruction of three-dimensional dynamic scenes is a well-established yet challenging task within the domain of computer vision. In this paper, we propose a novel approach that combines the domains of 3D geometry reconstruction and appearance estimation for physically based rendering and present a system that is able to perform both tasks for fabrics, utilizing only a single monocular RGB video sequence as input. In order to obtain realistic and high-quality deformations and renderings, a physical simulation of the cloth geometry and differentiable rendering are employed. In this paper, we introduce two novel regularization terms for the 3D reconstruction task that improve the plausibility of the reconstruction by addressing the depth ambiguity problem in monocular video. In comparison with the most recent methods in the field, we have reduced the error in the 3D reconstruction by a factor of 2.64 while requiring a medium runtime of 30 min per scene. Furthermore, the optimized motion achieves sufficient quality to perform an appearance estimation of the deforming object, recovering sharp details from this single monocular RGB video.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing 3D dynamic scenes from monocular video
Estimating fabric shape and appearance via physical simulation
Addressing depth ambiguity in monocular 3D reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable physical simulations for cloth geometry
Novel regularization terms for depth ambiguity
Monocular video-based appearance and shape estimation
🔎 Similar Papers
No similar papers found.