Realtime Motion Generation with Active Perception Using Attention Mechanism for Cooking Robot

📅 2023-09-26
🏛️ arXiv.org
📈 Citations: 7
Influential: 1
📄 PDF
🤖 AI Summary
Scrambled-egg preparation poses significant challenges due to thermally induced dynamic state evolution of egg liquid, strong multimodal sensory noise, time-varying critical modalities, and difficulty in real-time control. Method: This paper proposes PredRNN-Attention—a predictive recurrent neural network integrating cross-modal attention—to achieve real-time, closed-loop perception-decision-action integration. It dynamically weights visual, force, and thermal inputs and combines imitation learning with online closed-loop control. Contribution/Results: Deployed on the Dry-AIREC robotic platform, the method autonomously evolves a staged stirring strategy (full-pot stirring → localized flipping → regional splitting) for unknown ingredients—demonstrating, for the first time, ingredient-agnostic adaptive cooking. Experiments confirm accurate response to egg phase transitions, successful execution of real-world cooking tasks, and cross-ingredient generalization, establishing a novel paradigm for embodied intelligence in dynamic physical interaction tasks.
📝 Abstract
To support humans in their daily lives, robots are required to autonomously learn, adapt to objects and environments, and perform the appropriate actions. We tackled on the task of cooking scrambled eggs using real ingredients, in which the robot needs to perceive the states of the egg and adjust stirring movement in real time, while the egg is heated and the state changes continuously. In previous works, handling changing objects was found to be challenging because sensory information includes dynamical, both important or noisy information, and the modality which should be focused on changes every time, making it difficult to realize both perception and motion generation in real time. We propose a predictive recurrent neural network with an attention mechanism that can weigh the sensor input, distinguishing how important and reliable each modality is, that realize quick and efficient perception and motion generation. The model is trained with learning from the demonstration, and allows the robot to acquire human-like skills. We validated the proposed technique using the robot, Dry-AIREC, and with our learning model, it could perform cooking eggs with unknown ingredients. The robot could change the method of stirring and direction depending on the status of the egg, as in the beginning it stirs in the whole pot, then subsequently, after the egg started being heated, it starts flipping and splitting motion targeting specific areas, although we did not explicitly indicate them.
Problem

Research questions and friction points this paper is trying to address.

Autonomous robot manipulation of deformable objects with changing states
Real-time perception and motion generation during continuous state transitions
Multimodal sensory attention for distinguishing important versus noisy information
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recurrent neural network with attention mechanism
Weighs sensor input for important modalities
Enables real-time perception and motion generation
🔎 Similar Papers
No similar papers found.
Namiko Saito
Namiko Saito
Microsoft Research Asia - Tokyo
robot manipulationmultimodal learningrepresentation learning
M
Mayu Hiramoto
Department of Modern Mechanical Engineering, Waseda University, Tokyo, Japan
A
A. Kubo
Department of Modern Mechanical Engineering, Waseda University, Tokyo, Japan
Kanata Suzuki
Kanata Suzuki
Artificial Intelligence Laboratories, Fujitsu Limited, Kanagawa, Japan
H
Hiroshi Ito
Center for Technology Innovation - Controls and Robotics, Research & Development Group, Hitachi, Ltd., Ibaraki, Japan
S
S. Sugano
Department of Modern Mechanical Engineering, Waseda University, Tokyo, Japan
T
T. Ogata
Department of Intermedia Art and Science, Waseda University, Tokyo, Japan