CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing CAD generation methods suffer from insufficient representational capacity, hindering high-precision, multimodal (image/text/point cloud)-guided parametric modeling in industrial design. To address this, we propose CMT—the first B-Rep-based cascaded multimodal autoregressive framework. First, we introduce a novel cascaded multimodal autoregressive (MAR) architecture explicitly encoding geometric priors of “edge–loop–face” relationships. Second, we design a lightweight topology predictor enabling end-to-end inference from compact tokens to valid B-Rep topological structures. Third, we construct mmABC—the first million-scale multimodal CAD dataset (1.3M+ samples). Experiments show that unconditional generation on the ABC dataset achieves +10.68% Coverage and +10.3% Validity; image-conditioned generation reduces Chamfer distance by 4.01. All code, models, and the mmABC dataset are publicly released.

Technology Category

Application Category

📝 Abstract
While accurate and user-friendly Computer-Aided Design (CAD) is crucial for industrial design and manufacturing, existing methods still struggle to achieve this due to their over-simplified representations or architectures incapable of supporting multimodal design requirements. In this paper, we attempt to tackle this problem from both methods and datasets aspects. First, we propose a cascade MAR with topology predictor (CMT), the first multimodal framework for CAD generation based on Boundary Representation (B-Rep). Specifically, the cascade MAR can effectively capture the ``edge-counters-surface'' priors that are essential in B-Reps, while the topology predictor directly estimates topology in B-Reps from the compact tokens in MAR. Second, to facilitate large-scale training, we develop a large-scale multimodal CAD dataset, mmABC, which includes over 1.3 million B-Rep models with multimodal annotations, including point clouds, text descriptions, and multi-view images. Extensive experiments show the superior of CMT in both conditional and unconditional CAD generation tasks. For example, we improve Coverage and Valid ratio by +10.68% and +10.3%, respectively, compared to state-of-the-art methods on ABC in unconditional generation. CMT also improves +4.01 Chamfer on image conditioned CAD generation on mmABC. The dataset, code and pretrained network shall be released.
Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations in CAD generation methods for industrial design
Proposing a multimodal framework for CAD using B-Rep representation
Creating a large-scale dataset for training multimodal CAD models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascade MAR with topology predictor for B-Rep
Multimodal CAD dataset mmABC with annotations
Edge-counters-surface priors captured in B-Reps