OSCAR: Object Status and Contextual Awareness for Recipes to Support Non-Visual Cooking

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the need for cooking assistance among visually impaired users by proposing a non-visual recipe execution tracking method grounded in object-state awareness. Unlike conventional approaches relying on text parsing or action recognition, our method monitors step progression contextually by real-time perception of utensil and ingredient state transitions. Technically, it integrates large language models (LLMs) and vision-language models (VLMs) to perform recipe semantic parsing, multi-frame visual–state alignment, and dynamic progress logging. We introduce the first real-world non-visual cooking video benchmark—comprising 12 authentic recordings—and evaluate on 173 YouTube videos plus the 12 in-house samples. Our method achieves over 20% higher task completion accuracy than state-of-the-art baselines. Ablation studies identify state modeling granularity and cross-modal alignment quality as critical determinants of performance.

Technology Category

Application Category

📝 Abstract
Following recipes while cooking is an important but difficult task for visually impaired individuals. We developed OSCAR (Object Status Context Awareness for Recipes), a novel approach that provides recipe progress tracking and context-aware feedback on the completion of cooking tasks through tracking object statuses. OSCAR leverages both Large-Language Models (LLMs) and Vision-Language Models (VLMs) to manipulate recipe steps, extract object status information, align visual frames with object status, and provide cooking progress tracking log. We evaluated OSCAR's recipe following functionality using 173 YouTube cooking videos and 12 real-world non-visual cooking videos to demonstrate OSCAR's capability to track cooking steps and provide contextual guidance. Our results highlight the effectiveness of using object status to improve performance compared to baseline by over 20% across different VLMs, and we present factors that impact prediction performance. Furthermore, we contribute a dataset of real-world non-visual cooking videos with step annotations as an evaluation benchmark.
Problem

Research questions and friction points this paper is trying to address.

Supports visually impaired individuals in following recipes.
Tracks cooking progress using object status and context-aware feedback.
Evaluates effectiveness of object status tracking in recipe guidance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs and VLMs for recipe tracking.
Tracks object status for cooking progress.
Provides contextual feedback for non-visual cooking.
🔎 Similar Papers
No similar papers found.