π€ AI Summary
This work addresses the challenges of reconstructing real-world garment patterns from a single imageβnamely, the scarcity of paired data, poor generalization, and difficulty in modeling multi-layered structures. The authors propose a training-free pipeline that introduces Natural Garment Language (NGL) as an intermediate representation. By leveraging prompt engineering, large vision-language models (VLMs) are guided to extract structured garment parameters directly from in-the-wild images, which are then deterministically mapped to GarmentCode patterns. This approach achieves, for the first time, fine-tuned-free reconstruction of multi-layer garment patterns, eliminating reliance on synthetic single-layer data prevalent in prior methods. Evaluated on Dress4D, CloSe, and a newly collected dataset of 5,000 in-the-wild images, the method attains state-of-the-art geometric accuracy and significantly outperforms baselines in both human and GPT-based assessments.
π Abstract
Estimating sewing patterns from images is a practical approach for creating high-quality 3D garments. Due to the lack of real-world pattern-image paired data, prior approaches fine-tune large vision language models (VLMs) on synthetic garment datasets generated by randomly sampling from a parametric garment model GarmentCode. However, these methods often struggle to generalize to in-the-wild images, fail to capture real-world correlations between garment parts, and are typically restricted to single-layer outfits. In contrast, we observe that VLMs are effective at describing garments in natural language, yet perform poorly when asked to directly regress GarmentCode parameters from images. To bridge this gap, we propose NGL (Natural Garment Language), a novel intermediate language that restructures GarmentCode into a representation more understandable to language models. Leveraging this language, we introduce NGL-Prompter, a training-free pipeline that queries large VLMs to extract structured garment parameters, which are then deterministically mapped to valid GarmentCode. We evaluate our method on the Dress4D, CloSe and a newly collected dataset of approximately 5,000 in-the-wild fashion images. Our approach achieves state-of-the-art performance on standard geometry metrics and is strongly preferred in both human and GPT-based perceptual evaluations compared to existing baselines. Furthermore, NGL-prompter can recover multi-layer outfits whereas competing methods focus mostly on single-layer garments, highlighting its strong generalization to real-world images even with occluded parts. These results demonstrate that accurate sewing pattern reconstruction is possible without costly model training. Our code and data will be released for research use.