Better STEP, a format and dataset for boundary representation

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing B-rep datasets rely on the STEP format and proprietary CAD kernels, incurring high licensing costs and hindering scalable deployment in large-scale learning pipelines and distributed computing environments. Method: We propose an open-source, license-free B-rep data format based on HDF5, accompanied by a Python library enabling cross-platform, extensible geometric data processing. Contribution/Results: (1) We introduce the first lightweight, structured HDF5 representation for B-rep data; (2) we integrate end-to-end geometric processing algorithms—including normal estimation, curvature computation, denoising, and surface segmentation; (3) we present the first lossless STEP-to-HDF5 conversion for the ABC and Fusion 360 datasets. Four comprehensive use cases validate geometric fidelity and strict STEP consistency, demonstrating substantial reductions in deployment complexity and licensing overhead.

Technology Category

Application Category

📝 Abstract
Boundary representation (B-rep) generated from computer-aided design (CAD) is widely used in industry, with several large datasets available. However, the data in these datasets is represented in STEP format, requiring a CAD kernel to read and process it. This dramatically limits their scope and usage in large learning pipelines, as it constrains the possibility of deploying them on computing clusters due to the high cost of per-node licenses. This paper introduces an alternative format based on the open, cross-platform format HDF5 and a corresponding dataset for STEP files, paired with an open-source library to query and process them. Our Python package also provides standard functionalities such as sampling, normals, and curvature to ease integration in existing pipelines. To demonstrate the effectiveness of our format, we converted the Fusion 360 dataset and the ABC dataset. We developed four standard use cases (normal estimation, denoising, surface reconstruction, and segmentation) to assess the integrity of the data and its compliance with the original STEP files.
Problem

Research questions and friction points this paper is trying to address.

B-rep STEP format requires costly CAD kernel licensing
Existing datasets limit large-scale learning pipeline deployment
Lack open format for efficient CAD data processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses HDF5 format for CAD data
Provides open-source Python processing library
Converts STEP datasets for machine learning
🔎 Similar Papers
No similar papers found.
N
Nafiseh Izadyar
Department of Computer Science, University of Victoria
S
Sai Chandra Madduri
Department of Computer Science, University of Victoria
Teseo Schneider
Teseo Schneider
Assistant Professor at UVic
Geometry ProcessingComputer GraphicsNumerical SimulationsComputational ScienceMeshing