CulinaryCut-VLAP: A Vision-Language-Action-Physics Framework for Food Cutting via a Force-Aware Material Point Method

πŸ“… 2026-01-10
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Food cutting entails highly nonlinear interactions between a knife and deformable objects, characterized by large deformations, frequent contacts, and topological changes, which render stable and safe large-scale data collection in real-world settings extremely challenging. This work proposes a unified framework that, for the first time, integrates force-aware MLS-MPM high-fidelity physics simulation with a multimodal vision-language-action (VLA) dataset. By leveraging multi-view images, fine-grained language instructions, force-torque measurements, and tool pose annotations, the framework generates physically consistent training signals. It enables accurate modeling of mechanical responses under topological changes and establishes the first reproducible benchmark encompassing diverse cutting trajectories and multimodal observations, thereby providing a safe and scalable foundation for VLA learning in deformable object manipulation.

Technology Category

Application Category

πŸ“ Abstract
Food cutting is a highly practical yet underexplored application at the intersection of vision and robotic manipulation. The task remains challenging because interactions between the knife and deformable materials are highly nonlinear and often entail large deformations, frequent contact, and topological change, which in turn hinder stable and safe large-scale data collection. To address these challenges, we propose a unified framework that couples a vision-language-action (VLA) dataset with a physically realistic cutting simulator built on the material point method (MPM). Our simulator adopts MLS-MPM as its computational core, reducing numerical dissipation and energy drift while preserving rotational and shear responses even under topology-changing cuts. During cutting, forces and stress distributions are estimated from impulse exchanges between particles and the grid, enabling stable tracking of transient contact forces and energy transfer. We also provide a benchmark dataset that integrates diverse cutting trajectories, multi-view visual observations, and fine-grained language instructions, together with force--torque and tool--pose labels to provide physically consistent training signals. These components realize a learning--evaluation loop that respects the core physics of cutting and establishes a safe, reproducible, and scalable foundation for advancing VLA models in deformable object manipulation.
Problem

Research questions and friction points this paper is trying to address.

food cutting
deformable object manipulation
vision-language-action
physical simulation
topological change
Innovation

Methods, ideas, or system contributions that make the work stand out.

Material Point Method
Vision-Language-Action
Force-Aware Simulation
Deformable Object Manipulation
MLS-MPM
πŸ”Ž Similar Papers
No similar papers found.
Hyunseo Koh
Hyunseo Koh
κ΄‘μ£Όκ³Όν•™κΈ°μˆ μ› AIλŒ€ν•™μ›
C
Chang-Yong Song
Vanderbilt University
Y
Youngjae Choi
Soongsil University
M
Misa Viveiros
Vanderbilt University
David Hyde
David Hyde
Unknown affiliation
computational physicsfluid simulationmachine learninghigh-performance computing
H
Heewon Kim
Soongsil University