🤖 AI Summary
Existing 3D food volume estimation methods lack text-driven object selection capability, hindering user-specified food targeting. This paper proposes the first text-driven food volume estimation framework: given a food name, the system performs text-image cross-modal segmentation to precisely localize the target instance, followed by NeRF-based neural surface reconstruction to generate high-fidelity 3D meshes; volume is then computed via mesh discretization and voxelization. Our core contribution is the novel coupling of text-guided segmentation with neural surface reconstruction for food volume estimation—enabling fine-grained, user-specifiable, instance-level 3D modeling. Evaluated on the MetaFood3D dataset, our method achieves significant improvements in target isolation and surface reconstruction accuracy, reducing volume estimation error by 32.7% compared to prior approaches. The end-to-end pipeline supports real-world dietary analysis with precise, semantically controllable food quantification.
📝 Abstract
Accurate food volume estimation is crucial for dietary monitoring, medical nutrition management, and food intake analysis. Existing 3D Food Volume estimation methods accurately compute the food volume but lack for food portions selection. We present VolTex, a framework that improves change{the food object selection} in food volume estimation. Allowing users to specify a target food item via text input to be segmented, our method enables the precise selection of specific food objects in real-world scenes. The segmented object is then reconstructed using the Neural Surface Reconstruction method to generate high-fidelity 3D meshes for volume computation. Extensive evaluations on the MetaFood3D dataset demonstrate the effectiveness of our approach in isolating and reconstructing food items for accurate volume estimation. The source code is accessible at https://github.com/GCVCG/VolTex.