Safe MPC Alignment with Human Directional Feedback

📅 2024-07-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manually specifying or learning safety constraints from demonstrations is challenging in safety-critical robotic control. Method: This paper proposes the first verifiably aligned approach for learning Model Predictive Control (MPC) safety constraints from online, directional human feedback. It implicitly infers safety boundaries from sparse directional corrections, introduces a direction-driven hypothesis space update mechanism, and integrates a formally verifiable, safety-certified MPC optimization framework with online human-in-the-loop learning. Contributions/Results: Theoretically, it provides an upper bound on required feedback queries and includes a hypothesis mis-specification detection mechanism, ensuring verifiable safety alignment. Empirically, it achieves rapid convergence—within tens of directional corrections—in both a simulated game environment and a real-world Franka Emika robot pouring task. The method significantly improves both safety assurance and learning efficiency compared to prior approaches.

Technology Category

Application Category

📝 Abstract
In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this article, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy with human online directional feedback. To our knowledge, it is the first method to learn safety constraints from human feedback. The proposed method is based on an empirical observation: human directional feedback, when available, tends to guide the robot toward safer regions. The method only requires the direction of human feedback to update the learning hypothesis space. It is certifiable, providing an upper bound on the total number of human feedback in the case of successful learning, or declaring the hypothesis misspecification, i.e., the true implicit safety constraint cannot be found within the specified hypothesis space. We evaluated the proposed method using numerical examples and user studies in two simulation games. Additionally, we implemented and tested the proposed method on a real-world Franka robot arm performing mobile water-pouring tasks. The results demonstrate the efficacy and efficiency of our method, showing that it enables a robot to successfully learn safety constraints with a small handful (tens) of human directional corrections.
Problem

Research questions and friction points this paper is trying to address.

Robot Safety
Rule Learning
Human-Robot Interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Directional Guidance
Safety Rule Learning
Multi-point Control
🔎 Similar Papers
No similar papers found.
Zhixian Xie
Zhixian Xie
Arizona State University
Robot LearningDexterous Manipulation
W
Wenlong Zhang
Polytechnic School, Arizona State University
Y
Yi Ren
School for Engineering of Matter, Transport and Energy, Arizona State University
Zhaoran Wang
Zhaoran Wang
Associate Professor at Northwestern University
Deep Reinforcement LearningData-Driven Decision-MakingOptimization Under Uncertainty
G
George J. Pappas
Department of Electrical and Systems Engineering, University of Pennsylvania
Wanxin Jin
Wanxin Jin
Assistant Professor at Arizona State University
RoboticsControlOptimizationManipulationMachine learning