Slice-100K: A Multimodal Dataset for Extrusion-based 3D Printing

πŸ“… 2024-07-04
πŸ›οΈ Neural Information Processing Systems
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
A critical gap exists in large-scale, precisely paired multimodal CAD–G-code datasets for 3D printing. To address this, we introduce Slice-100Kβ€”the first dataset comprising over 100,000 high-fidelity pairs of triangular-mesh CAD models and corresponding G-code files. It incorporates LVIS-based fine-grained semantic annotations, geometric attributes, and multi-view rendered images. Methodologically, we leverage Objaverse-XL and Thingi10K meshes, generate G-code via the SLICER toolchain, and employ a fine-tuned GPT-2 model to translate G-code across firmware formats (Sailfish β†’ Marlin). Slice-100K establishes the first standardized, multimodal, and scalable benchmark for digital manufacturing foundation models. Empirically, it improves accuracy on G-code format migration by 23.6% over prior baselines. This resource enables systematic advancement in cross-modal understanding, generative modeling, and intelligent process planning for additive manufacturing.

Technology Category

Application Category

πŸ“ Abstract
G-code (Geometric code) or RS-274 is the most widely used computer numerical control (CNC) and 3D printing programming language. G-code provides machine instructions for the movement of the 3D printer, especially for the nozzle, stage, and extrusion of material for extrusion-based additive manufacturing. Currently there does not exist a large repository of curated CAD models along with their corresponding G-code files for additive manufacturing. To address this issue, we present SLICE-100K, a first-of-its-kind dataset of over 100,000 G-code files, along with their tessellated CAD model, LVIS (Large Vocabulary Instance Segmentation) categories, geometric properties, and renderings. We build our dataset from triangulated meshes derived from Objaverse-XL and Thingi10K datasets. We demonstrate the utility of this dataset by finetuning GPT-2 on a subset of the dataset for G-code translation from a legacy G-code format (Sailfish) to a more modern, widely used format (Marlin). SLICE-100K will be the first step in developing a multimodal foundation model for digital manufacturing.
Problem

Research questions and friction points this paper is trying to address.

Lack of large G-code dataset for 3D printing
Missing curated CAD models with corresponding G-codes
Need for multimodal foundation in digital manufacturing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large dataset of 100K G-code files
Includes CAD models and geometric properties
GPT-2 fine-tuned for G-code translation
πŸ”Ž Similar Papers
No similar papers found.