🤖 AI Summary
Existing feedforward approaches struggle to handle the highly variable poses and viewpoints present in real-world garment images, while optimization-based methods are computationally expensive and difficult to scale. This work proposes an end-to-end feedforward framework that reconstructs physically consistent 2D sewing patterns and their corresponding 3D garments from a single real image, without requiring multi-view inputs or iterative optimization. The method innovatively integrates a vision-language model for pose normalization and combines 3D-aware features with a Transformer encoder to generate editable and simulation-ready patterns. Experiments demonstrate that the approach robustly recovers high-quality results across diverse real-world images, enabling downstream applications such as physical simulation, texture synthesis, and multi-layer virtual try-on.
📝 Abstract
Recent advances in garment pattern generation have shown promising progress. However, existing feed-forward methods struggle with diverse poses and viewpoints, while optimization-based approaches are computationally expensive and difficult to scale. This paper focuses on sewing pattern generation for garment modeling and fabrication applications that demand editable, separable, and simulation-ready garments. We propose DressWild, a novel feed-forward pipeline that reconstructs physics-consistent 2D sewing patterns and the corresponding 3D garments from a single in-the-wild image. Given an input image, our method leverages vision-language models (VLMs) to normalize pose variations at the image level, then extract pose-aware, 3D-informed garment features. These features are fused through a transformer-based encoder and subsequently used to predict sewing pattern parameters, which can be directly applied to physical simulation, texture synthesis, and multi-layer virtual try-on. Extensive experiments demonstrate that our approach robustly recovers diverse sewing patterns and the corresponding 3D garments from in-the-wild images without requiring multi-view inputs or iterative optimization, offering an efficient and scalable solution for realistic garment simulation and animation.