Safety Guardrails for LLM-Enabled Robots

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses physical safety risks—such as hallucination-induced errors and jailbreak attacks—in LLM-driven robots operating in dynamic real-world environments. We propose RoboGuard, a novel two-stage safety guard architecture. Methodologically, RoboGuard introduces the first environment-aware, trusted LLM root integrated with temporal logic–based control synthesis, and pioneers the automatic generation of formal safety specifications alongside preference-preserving conflict resolution within the LLM-robot closed loop. Technically, it unifies a root-trusted LLM with chain-of-thought reasoning, temporal logic constraint modeling, model-checking–driven control synthesis, and a multi-level verification framework. Experimental evaluation demonstrates a reduction in unsafe action execution rate from 92% to under 2.5%, zero degradation in safe task performance, and strong resource efficiency, robustness against adaptive attacks, and enhanced reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Although the integration of large language models (LLMs) into robotics has unlocked transformative capabilities, it has also introduced significant safety concerns, ranging from average-case LLM errors (e.g., hallucinations) to adversarial jailbreaking attacks, which can produce harmful robot behavior in real-world settings. Traditional robot safety approaches do not address the novel vulnerabilities of LLMs, and current LLM safety guardrails overlook the physical risks posed by robots operating in dynamic real-world environments. In this paper, we propose RoboGuard, a two-stage guardrail architecture to ensure the safety of LLM-enabled robots. RoboGuard first contextualizes pre-defined safety rules by grounding them in the robot's environment using a root-of-trust LLM, which employs chain-of-thought (CoT) reasoning to generate rigorous safety specifications, such as temporal logic constraints. RoboGuard then resolves potential conflicts between these contextual safety specifications and a possibly unsafe plan using temporal logic control synthesis, which ensures safety compliance while minimally violating user preferences. Through extensive simulation and real-world experiments that consider worst-case jailbreaking attacks, we demonstrate that RoboGuard reduces the execution of unsafe plans from 92% to below 2.5% without compromising performance on safe plans. We also demonstrate that RoboGuard is resource-efficient, robust against adaptive attacks, and significantly enhanced by enabling its root-of-trust LLM to perform CoT reasoning. These results underscore the potential of RoboGuard to mitigate the safety risks and enhance the reliability of LLM-enabled robots.
Problem

Research questions and friction points this paper is trying to address.

Addresses safety concerns in LLM-enabled robots.
Proposes RoboGuard for mitigating physical risks.
Ensures safety compliance with minimal user preference violation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage guardrail architecture for safety
Chain-of-thought reasoning for safety specifications
Temporal logic control synthesis for conflict resolution
🔎 Similar Papers
No similar papers found.