🤖 AI Summary
Existing CAD generation methods suffer from insufficient representational capacity, hindering high-precision, multimodal (image/text/point cloud)-guided parametric modeling in industrial design. To address this, we propose CMT—the first B-Rep-based cascaded multimodal autoregressive framework. First, we introduce a novel cascaded multimodal autoregressive (MAR) architecture explicitly encoding geometric priors of “edge–loop–face” relationships. Second, we design a lightweight topology predictor enabling end-to-end inference from compact tokens to valid B-Rep topological structures. Third, we construct mmABC—the first million-scale multimodal CAD dataset (1.3M+ samples). Experiments show that unconditional generation on the ABC dataset achieves +10.68% Coverage and +10.3% Validity; image-conditioned generation reduces Chamfer distance by 4.01. All code, models, and the mmABC dataset are publicly released.
📝 Abstract
While accurate and user-friendly Computer-Aided Design (CAD) is crucial for industrial design and manufacturing, existing methods still struggle to achieve this due to their over-simplified representations or architectures incapable of supporting multimodal design requirements. In this paper, we attempt to tackle this problem from both methods and datasets aspects. First, we propose a cascade MAR with topology predictor (CMT), the first multimodal framework for CAD generation based on Boundary Representation (B-Rep). Specifically, the cascade MAR can effectively capture the ``edge-counters-surface'' priors that are essential in B-Reps, while the topology predictor directly estimates topology in B-Reps from the compact tokens in MAR. Second, to facilitate large-scale training, we develop a large-scale multimodal CAD dataset, mmABC, which includes over 1.3 million B-Rep models with multimodal annotations, including point clouds, text descriptions, and multi-view images. Extensive experiments show the superior of CMT in both conditional and unconditional CAD generation tasks. For example, we improve Coverage and Valid ratio by +10.68% and +10.3%, respectively, compared to state-of-the-art methods on ABC in unconditional generation. CMT also improves +4.01 Chamfer on image conditioned CAD generation on mmABC. The dataset, code and pretrained network shall be released.