🤖 AI Summary
Existing food logging methods—such as handwritten journals and unimodal mobile applications—suffer from low user adherence and insufficient contextual information, leading to inaccurate nutritional assessments. To address this, we propose a multimodal AI–based dietary tracking system featuring a novel object-dependent questioning mechanism. The system jointly processes user-uploaded grocery receipts, food images, and textual descriptions, and dynamically enriches dietary context by aligning with structured nutrition databases. This enables fine-grained food identification, personalized nutrient matching, and context-aware reasoning. Deployed in a real-world three-week study, the system collected over 500 dietary records; users reported high usability of multimodal input and significantly improved perceived accuracy. To our knowledge, this is the first work to co-model structured receipt data with multimodal AI for dietary tracking, thereby enhancing both logging completeness and semantic understanding of eating behaviors.
📝 Abstract
Food logging, both self-directed and prescribed, plays a critical role in uncovering correlations between diet, medical, fitness, and health outcomes. Through conversations with nutritional experts and individuals who practice dietary tracking, we find current logging methods, such as handwritten and app-based journaling, are inflexible and result in low adherence and potentially inaccurate nutritional summaries. These findings, corroborated by prior literature, emphasize the urgent need for improved food logging methods. In response, we propose SnappyMeal, an AI-powered dietary tracking system that leverages multimodal inputs to enable users to more flexibly log their food intake. SnappyMeal introduces goal-dependent follow-up questions to intelligently seek missing context from the user and information retrieval from user grocery receipts and nutritional databases to improve accuracy. We evaluate SnappyMeal through publicly available nutrition benchmarks and a multi-user, 3-week, in-the-wild deployment capturing over 500 logged food instances. Users strongly praised the multiple available input methods and reported a strong perceived accuracy. These insights suggest that multimodal AI systems can be leveraged to significantly improve dietary tracking flexibility and context-awareness, laying the groundwork for a new class of intelligent self-tracking applications.