🤖 AI Summary
Industrial CAD models often lack semantic, spatial, and functional information, limiting their utility in robotic simulation and high-level scene understanding. This work proposes an offline method that leverages large vision-language models (LVLMs)—introduced for the first time into CAD environments—to automatically generate structured 3D scene graphs that explicitly model manipulable objects and their functional relationships. By effectively integrating semantic parsing with functional reasoning, the approach achieves high-precision semantic annotation and relationship recognition on industrial structures such as piping systems. Both qualitative and quantitative evaluations demonstrate its effectiveness. The associated code and dataset have been made publicly available.
📝 Abstract
Utilizing functional elements in an industrial environment, such as displays and interactive valves, provide effective possibilities for robot training. When preparing simulations for robots or applications that involve high-level scene understanding, the simulation environment must be equally detailed. Although CAD files for such environments deliver an exact description of the geometry and visuals, they usually lack semantic, relational and functional information, thus limiting the simulation and training possibilities. A 3D scene graph can organize semantic, spatial and functional information by enriching the environment through a Large Vision-Language Model (LVLM). In this paper we present an offline approach to creating detailed 3D scene graphs from CAD environments. This will serve as a foundation to include the relations of functional and actionable elements, which then can be used for dynamic simulation and reasoning. Key results of this research include both quantitative results of the generated semantic labels as well as qualitative results of the scene graph, especially in hindsight of pipe structures and identified functional relations. All code, results and the environment will be made available at https://cad-scenegraph.github.io