Evaluating the Feasibility of Inferring Dietary Behavior Change Receptivity from Egocentric Images of Eating Environment

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the limitations of traditional self-report methods for assessing willingness to change dietary behaviors—namely, their sparsity and latency—which hinder real-time intervention. For the first time, it explores the use of first-person eating environment images captured by wearable cameras to passively infer individuals’ readiness for behavior change across three dimensions: awareness, interaction capability, and motivation. The proposed approach employs a transfer learning framework based on the CLIP visual encoder, augmented with a lightweight Transformer to model sequences of eating-scene images. Preliminary experiments demonstrate that the method significantly outperforms baseline models across multiple willingness indicators, establishing the feasibility of unobtrusively inferring dietary behavior change readiness from visual cues and offering a novel pathway for intelligent health interventions.

📝 Abstract

Accurately assessing dietary behavior change receptivity is essential for designing effective just-in-time adaptive interventions (JITAIs) that promote healthier eating habits. However, self-report-based assessment of behavior change receptivity is sparse and delayed, limiting its practical use in continuous monitoring. To explore whether passive sensing may help address this challenge, this study conducts a pilot investigation of inferring participants' self-reported behavior change receptivity from egocentric eating images collected by a wearable camera. We use pilot data obtained from free-living eating episodes using the Automatic Ingestion Monitor v2 (AIM-2). The data included egocentric image sequences captured during eating and paired with responses to questions assessing specific dimensions of behavior change receptivity (awareness, interaction capability, and motivation). To examine whether visual information contained any relevancy to these responses, we evaluated a transfer-learning-assisted framework that combines a pre-trained Contrastive Language-Image Pre-Training (CLIP) vision encoder with a lightweight transformer classifier. The model processes eating episode image sequences to extract potential semantic and temporal cues related to behavior change receptivity. Preliminary experimental results show promising improvements over simple baseline models for behavior change receptivity indicators. These early findings suggest that egocentric eating episode images may contain cues related to dietary behavior change receptivity, and warrant further investigation with larger and more comprehensive datasets.

Problem

Research questions and friction points this paper is trying to address.

dietary behavior change receptivity

egocentric images

just-in-time adaptive interventions

passive sensing

eating environment

Innovation

Methods, ideas, or system contributions that make the work stand out.

egocentric vision

behavior change receptivity

CLIP