Implicit-Scale 3D Reconstruction for Multi-Food Volume Estimation from Monocular Images

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF

Technology Category

Application Category

📝 Abstract
We present Implicit-Scale 3D Reconstruction from Monocular Multi-Food Images, a benchmark dataset designed to advance geometry-based food portion estimation in realistic dining scenarios. Existing dietary assessment methods largely rely on single-image analysis or appearance-based inference, including recent vision-language models, which lack explicit geometric reasoning and are sensitive to scale ambiguity. This benchmark reframes food portion estimation as an implicit-scale 3D reconstruction problem under monocular observations. To reflect real-world conditions, explicit physical references and metric annotations are removed; instead, contextual objects such as plates and utensils are provided, requiring algorithms to infer scale from implicit cues and prior knowledge. The dataset emphasizes multi-food scenes with diverse object geometries, frequent occlusions, and complex spatial arrangements. The benchmark was adopted as a challenge at the MetaFood 2025 Workshop, where multiple teams proposed reconstruction-based solutions. Experimental results show that while strong vision--language baselines achieve competitive performance, geometry-based reconstruction methods provide both improved accuracy and greater robustness, with the top-performing approach achieving 0.21 MAPE in volume estimation and 5.7 L1 Chamfer Distance in geometric accuracy.
Problem

Research questions and friction points this paper is trying to address.

food volume estimation
monocular 3D reconstruction
scale ambiguity
multi-food scenes
implicit-scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit-Scale 3D Reconstruction
Monocular Food Volume Estimation
Geometry-Based Dietary Assessment
Multi-Food Scene Understanding
Scale-Ambiguity Resolution
🔎 Similar Papers
No similar papers found.