Exploring Object Status Recognition for Recipe Progress Tracking in Non-Visual Cooking

📅 2025-07-04

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the non-visual cooking needs of blind and low-vision individuals by proposing OSCAR, the first framework to systematically investigate object state recognition for recipe progress tracking. To overcome the lack of real-world non-visual cooking data and fine-grained state modeling, we introduce the first benchmark dataset comprising 173 instructional videos and 12 authentic cooking sessions conducted by blind participants. OSCAR integrates recipe parsing, multimodal vision-language modeling, vision–state alignment, and temporal causal modeling to enable step-level state awareness and real-time progress tracking. Experiments demonstrate that incorporating object state information significantly improves step prediction accuracy across all evaluated models. Our analysis identifies key factors affecting practical performance, revealing critical design considerations for robust deployment. This work establishes a novel paradigm and technical foundation for scalable, context-aware assistive cooking systems.

Technology Category

Application Category

📝 Abstract

Cooking plays a vital role in everyday independence and well-being, yet remains challenging for people with vision impairments due to limited support for tracking progress and receiving contextual feedback. Object status - the condition or transformation of ingredients and tools - offers a promising but underexplored foundation for context-aware cooking support. In this paper, we present OSCAR (Object Status Context Awareness for Recipes), a technical pipeline that explores the use of object status recognition to enable recipe progress tracking in non-visual cooking. OSCAR integrates recipe parsing, object status extraction, visual alignment with cooking steps, and time-causal modeling to support real-time step tracking. We evaluate OSCAR on 173 instructional videos and a real-world dataset of 12 non-visual cooking sessions recorded by BLV individuals in their homes. Our results show that object status consistently improves step prediction accuracy across vision-language models, and reveal key factors that impact performance in real-world conditions, such as implicit tasks, camera placement, and lighting. We contribute the pipeline of context-aware recipe progress tracking, an annotated real-world non-visual cooking dataset, and design insights to guide future context-aware assistive cooking systems.

Problem

Research questions and friction points this paper is trying to address.

Recognizing object status for non-visual cooking progress tracking

Providing context-aware support for visually impaired cooks

Improving recipe step accuracy using vision-language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Object status recognition for recipe tracking

Integrates recipe parsing and visual alignment

Time-causal modeling for real-time support

🔎 Similar Papers

No similar papers found.