A Highly Clean Recipe Dataset with Ingredient States Annotation for State Probing Task

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to accurately model implicit intermediate state changes of ingredients during cooking, hindering deep procedural understanding of culinary texts. To address this, we introduce J-CookState—the first high-quality, fine-grained annotated dataset of Japanese recipes—explicitly labeling physical and chemical state transitions of ingredients at each step. We propose three novel evaluation tasks: state detection, state prediction, and state consistency verification. Our methodology combines structured text extraction with multi-round expert annotation and rigorous manual validation to ensure high data fidelity. Experiments on state-of-the-art open-weight models—including Llama3.1-70B and Qwen2.5-72B—demonstrate that incorporating ingredient state knowledge substantially improves procedural comprehension, achieving performance on par with leading commercial LLMs in state tracking. This work pioneers systematic, fine-grained state modeling and evaluation of real-world dynamic processes by LLMs.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are trained on a vast amount of procedural texts, but they do not directly observe real-world phenomena. In the context of cooking recipes, this poses a challenge, as intermediate states of ingredients are often omitted, making it difficult for models to track ingredient states and understand recipes accurately. In this paper, we apply state probing, a method for evaluating a language model's understanding of the world, to the domain of cooking. We propose a new task and dataset for evaluating how well LLMs can recognize intermediate ingredient states during cooking procedures. We first construct a new Japanese recipe dataset with clear and accurate annotations of ingredient state changes, collected from well-structured and controlled recipe texts. Using this dataset, we design three novel tasks to evaluate whether LLMs can track ingredient state transitions and identify ingredients present at intermediate steps. Our experiments with widely used LLMs, such as Llama3.1-70B and Qwen2.5-72B, show that learning ingredient state knowledge improves their understanding of cooking processes, achieving performance comparable to commercial LLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to track ingredient state changes in recipes
Creating a dataset with annotated ingredient states for state probing
Assessing LLM performance in understanding cooking process dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Construct annotated Japanese recipe dataset
Design tasks for state transition tracking
Evaluate LLMs on ingredient state understanding
🔎 Similar Papers
No similar papers found.