Implicit-Scale 3D Reconstruction for Multi-Food Volume Estimation from Monocular Images

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

198K/year

Technology Category

Application Category

📝 Abstract

We present Implicit-Scale 3D Reconstruction from Monocular Multi-Food Images, a benchmark dataset designed to advance geometry-based food portion estimation in realistic dining scenarios. Existing dietary assessment methods largely rely on single-image analysis or appearance-based inference, including recent vision-language models, which lack explicit geometric reasoning and are sensitive to scale ambiguity. This benchmark reframes food portion estimation as an implicit-scale 3D reconstruction problem under monocular observations. To reflect real-world conditions, explicit physical references and metric annotations are removed; instead, contextual objects such as plates and utensils are provided, requiring algorithms to infer scale from implicit cues and prior knowledge. The dataset emphasizes multi-food scenes with diverse object geometries, frequent occlusions, and complex spatial arrangements. The benchmark was adopted as a challenge at the MetaFood 2025 Workshop, where multiple teams proposed reconstruction-based solutions. Experimental results show that while strong vision--language baselines achieve competitive performance, geometry-based reconstruction methods provide both improved accuracy and greater robustness, with the top-performing approach achieving 0.21 MAPE in volume estimation and 5.7 L1 Chamfer Distance in geometric accuracy.

Problem

Research questions and friction points this paper is trying to address.

food volume estimation

monocular 3D reconstruction

scale ambiguity

multi-food scenes

implicit-scale

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit-Scale 3D Reconstruction

Monocular Food Volume Estimation

Geometry-Based Dietary Assessment