🤖 AI Summary
Existing 3D datasets and models predominantly emphasize geometric structure while neglecting material properties—key determinants of visual appearance. This work introduces the first zero-shot, two-stage inference framework that jointly infers semantic categories and material compositions directly from 3D point clouds with coarse segmentation. Our core innovation leverages large language models (LLMs) as universal priors: Stage I generates descriptive semantic labels via LLM prompting; Stage II conditions material assignment on these semantic labels, without fine-tuning or task-specific training. By decoupling “what the object is” from “what it is made of,” our method achieves cross-modal alignment between geometry and material. We evaluate on 1,000 shapes from Fusion/ABS and ShapeNet, demonstrating high plausibility in both semantic and material inference. Assessment employs the LLM-as-a-Judge paradigm from DeepEval. To our knowledge, this is the first work to empirically validate LLMs’ capability to bridge 3D geometric understanding and material cognition.
📝 Abstract
Most existing 3D shape datasets and models focus solely on geometry, overlooking the material properties that determine how objects appear. We introduce a two-stage large language model (LLM) based method for inferring material composition directly from 3D point clouds with coarse segmentations. Our key insight is to decouple reasoning about what an object is from what it is made of. In the first stage, an LLM predicts the object's semantic; in the second stage, it assigns plausible materials to each geometric segment, conditioned on the inferred semantics. Both stages operate in a zero-shot manner, without task-specific training. Because existing datasets lack reliable material annotations, we evaluate our method using an LLM-as-a-Judge implemented in DeepEval. Across 1,000 shapes from Fusion/ABS and ShapeNet, our method achieves high semantic and material plausibility. These results demonstrate that language models can serve as general-purpose priors for bridging geometric reasoning and material understanding in 3D data.