🤖 AI Summary
Existing 3D garment modeling approaches suffer from limited interactivity, insufficient geometric fidelity, and constrained design diversity, hindering practical fashion design workflows. This paper introduces the first dialogue-driven, multimodal garment modeling paradigm that enables end-to-end generation and interactive editing of 3D sewing patterns from image or text inputs. Our key contributions are: (1) direct fine-tuning of a vision-language model (VLM) to output structured JSON encoding semantic and geometric attributes—bypassing intermediate representations and directly driving parametric pattern generation; (2) extension of the GarmentCode architecture and construction of a large-scale, aligned image-text–pattern dataset; and (3) integrated real-time draping simulation and natural-language-guided pattern editing. Experiments demonstrate high-fidelity reconstruction and controllable synthesis across diverse inputs—including real images, sketches, and textual descriptions—significantly improving efficiency in fashion design and game asset creation.
📝 Abstract
We introduce ChatGarment, a novel approach that leverages large vision-language models (VLMs) to automate the estimation, generation, and editing of 3D garments from images or text descriptions. Unlike previous methods that struggle in real-world scenarios or lack interactive editing capabilities, ChatGarment can estimate sewing patterns from in-the-wild images or sketches, generate them from text descriptions, and edit garments based on user instructions, all within an interactive dialogue. These sewing patterns can then be draped into 3D garments, which are easily animatable and simulatable. This is achieved by finetuning a VLM to directly generate a JSON file that includes both textual descriptions of garment types and styles, as well as continuous numerical attributes. This JSON file is then used to create sewing patterns through a programming parametric model. To support this, we refine the existing programming model, GarmentCode, by expanding its garment type coverage and simplifying its structure for efficient VLM fine-tuning. Additionally, we construct a large-scale dataset of image-to-sewing-pattern and text-to-sewing-pattern pairs through an automated data pipeline. Extensive evaluations demonstrate ChatGarment's ability to accurately reconstruct, generate, and edit garments from multimodal inputs, highlighting its potential to revolutionize workflows in fashion and gaming applications. Code and data will be available at https://chatgarment.github.io/.