🤖 AI Summary
This work addresses the longstanding challenge of high-level semantic editing of B-rep CAD models—hindered by scarce 3D annotation data, the structural fragility of B-rep representations, and the absence of semantic interfaces. We propose the first multimodal large language model (mLLM) framework tailored for B-rep editing. Methodologically, we design a dedicated multimodal encoder that jointly processes natural-language instructions and B-rep topological–geometric structures; leveraging CAD kernels, we automatically synthesize high-fidelity reasoning data without manual annotation. The mLLM is then fine-tuned to perform text-driven, syntactically valid, and semantically consistent edits. Experiments demonstrate robust performance across diverse complex editing tasks, with 100% of generated outputs passing CAD-kernel validity verification. This work breaks the bottleneck between semantic understanding and executable editing in 3D modeling, achieving, for the first time, end-to-end controllable editing of industrial-grade B-rep models by foundation models.
📝 Abstract
Multimodal large language models (mLLMs), trained in a mixed modal setting as a universal model, have been shown to compete with or even outperform many specialized algorithms for imaging and graphics tasks. As demonstrated across many applications, mLLMs' ability to jointly process image and text data makes them suitable for zero-shot applications or efficient fine-tuning towards specialized tasks. However, they have had limited success in 3D analysis and editing tasks. This is due to both the lack of suitable (annotated) 3D data as well as the idiosyncrasies of 3D representations. In this paper, we investigate whether mLLMs can be adapted to support high-level editing of Boundary Representation (B-rep) CAD objects. B-reps remain the industry-standard for precisely encoding engineering objects, but are challenging as the representation is fragile (i.e. can easily lead to invalid CAD objects) and no publicly available data source exists with semantically-annotated B-reps or CAD construction history. We present B-repLer as a finetuned mLLM that can understand text prompts and make semantic edits on given B-Reps to produce valid outputs. We enable this via a novel multimodal architecture, specifically designed to handle B-rep models, and demonstrate how existing CAD tools, in conjunction with mLLMs, can be used to automatically generate the required reasoning dataset, without relying on external annotations. We extensively evaluate B-repLer and demonstrate several text-based B-rep edits of various complexity, which were not previously possible.