🤖 AI Summary
To address surface irregularity and excessive storage overhead in AI-driven 3D generation, this paper proposes a text-image cross-modal, multi-stage parametric primitive generation framework. Methodologically, it introduces differentiable parametric primitives—such as spheres, cylinders, and planes—as geometric building blocks, integrating feature-aware recognition with robust primitive fitting to achieve high geometric fidelity and C¹-smooth surfaces. A minimalist parameter encoding scheme is devised, storing only primitive type, pose, and scale—reducing representation to ~6 KB. Evaluated on both synthetic and real-world datasets, the method achieves state-of-the-art performance: Chamfer Distance = 0.003092, VIoU = 0.545, F1-Score = 0.9139, and Normal Consistency = 0.8369—outperforming leading implicit and mesh-based approaches. This work pioneers the systematic integration of parametric primitive modeling into cross-modal 3D generation, enabling simultaneously high-fidelity reconstruction and ultra-lightweight representation—ideal for real-time prototyping and edge deployment.
📝 Abstract
Recent advancements in AI-driven 3D model generation have leveraged cross modality, yet generating models with smooth surfaces and minimizing storage overhead remain challenges. This paper introduces a novel multi-stage framework for generating 3D models composed of parameterized primitives, guided by textual and image inputs. In the framework, A model generation algorithm based on parameterized primitives, is proposed, which can identifies the shape features of the model constituent elements, and replace the elements with parameterized primitives with high quality surface. In addition, a corresponding model storage method is proposed, it can ensure the original surface quality of the model, while retaining only the parameters of parameterized primitives. Experiments on virtual scene dataset and real scene dataset demonstrate the effectiveness of our method, achieving a Chamfer Distance of 0.003092, a VIoU of 0.545, a F1-Score of 0.9139 and a NC of 0.8369, with primitive parameter files approximately 6KB in size. Our approach is particularly suitable for rapid prototyping of simple models.