BrepLLM: Native Boundary Representation Understanding with Large Language Models

๐Ÿ“… 2025-12-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) struggle to directly parse complex geometric-topological structures encoded in native 3D boundary representation (B-rep) data, creating a fundamental modality gap between 3D geometry and natural language. Method: This work introduces the first end-to-end framework enabling LLMs to understand and reason over raw B-rep. It employs a two-stage training paradigm: (i) adaptive UV-sampling-based graph modeling and hierarchical BrepEncoder for joint geometric-topological feature extraction; (ii) hybrid query experts (MQE) and cross-modal contrastive learning, integrating CLIP-ViT-L/14 text encoding with MLP-based semantic projection, followed by three-stage progressive fine-tuning. Contribution/Results: We construct Brep2Textโ€”the first large-scale B-repโ€“to-text dataset (269K pairs)โ€”and achieve state-of-the-art performance on both 3D shape classification and descriptive text generation tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Current token-sequence-based Large Language Models (LLMs) are not well-suited for directly processing 3D Boundary Representation (Brep) models that contain complex geometric and topological information. We propose BrepLLM, the first framework that enables LLMs to parse and reason over raw Brep data, bridging the modality gap between structured 3D geometry and natural language. BrepLLM employs a two-stage training pipeline: Cross-modal Alignment Pre-training and Multi-stage LLM Fine-tuning. In the first stage, an adaptive UV sampling strategy converts Breps into graphs representation with geometric and topological information. We then design a hierarchical BrepEncoder to extract features from geometry (i.e., faces and edges) and topology, producing both a single global token and a sequence of node tokens. Then we align the global token with text embeddings from a frozen CLIP text encoder (ViT-L/14) via contrastive learning. In the second stage, we integrate the pretrained BrepEncoder into an LLM. We then align its sequence of node tokens using a three-stage progressive training strategy: (1) training an MLP-based semantic mapping from Brep representation to 2D with 2D-LLM priors. (2) performing fine-tuning of the LLM. (3) designing a Mixture-of-Query Experts (MQE) to enhance geometric diversity modeling. We also construct Brep2Text, a dataset comprising 269,444 Brep-text question-answer pairs. Experiments show that BrepLLM achieves state-of-the-art (SOTA) results on 3D object classification and captioning tasks.
Problem

Research questions and friction points this paper is trying to address.

Enabling LLMs to process complex 3D boundary representation data directly
Bridging the modality gap between structured 3D geometry and natural language
Improving 3D object classification and captioning through native Brep understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage training pipeline with cross-modal alignment
Hierarchical encoder for geometry and topology features
Mixture-of-Query Experts for geometric diversity modeling
๐Ÿ”Ž Similar Papers
No similar papers found.
Liyuan Deng
Liyuan Deng
Professor, Chemical Engineering Dept., Norwegian University of Science and Technology
MembraneCO2 CaptureSeparation processbattery separators
H
Hao Guo
Northwestern Polytechnical University
Y
Yunpeng Bai
National University of Singapore
Y
Yongkang Dai
Northwestern Polytechnical University
H
Huaxi Huang
Shanghai Artificial Intelligence Laboratory
Yilei Shi
Yilei Shi
Augmented Human Lab, Singapore University of Technology and Design
Human Computer Interaction