Can We Trust Embodied Agents? Exploring Backdoor Attacks against Embodied LLM-based Decision-Making Systems

📅 2024-05-27

📈 Citations: 3

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses a critical security vulnerability in embodied AI: the susceptibility of LLM-based decision systems to backdoor attacks during fine-tuning. We propose BALD, the first backdoor attack framework tailored to embodied settings, introducing three novel attack mechanisms—token injection, scene manipulation, and knowledge injection. For the first time, we demonstrate that the entire perception–reasoning–planning decision chain of LLMs can be stealthily hijacked without runtime intrusion. Evaluated on GPT-3.5, LLaMA2, and PaLM2 across autonomous driving and household robotics tasks, token and knowledge injection achieve near-100% attack success rates, while scene manipulation achieves 65%–90%. These attacks reliably trigger high-risk behaviors—including collision with obstacles and misplacement of sharp objects. Our study systematically uncovers an intrinsic security fragility in fine-tuned embodied LLMs and establishes a benchmark framework and empirical foundation for future defense research.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown significant promise in real-world decision-making tasks for embodied artificial intelligence, especially when fine-tuned to leverage their inherent common sense and reasoning abilities while being tailored to specific applications. However, this fine-tuning process introduces considerable safety and security vulnerabilities, especially in safety-critical cyber-physical systems. In this work, we propose the first comprehensive framework for Backdoor Attacks against LLM-based Decision-making systems (BALD) in embodied AI, systematically exploring the attack surfaces and trigger mechanisms. Specifically, we propose three distinct attack mechanisms: word injection, scenario manipulation, and knowledge injection, targeting various components in the LLM-based decision-making pipeline. We perform extensive experiments on representative LLMs (GPT-3.5, LLaMA2, PaLM2) in autonomous driving and home robot tasks, demonstrating the effectiveness and stealthiness of our backdoor triggers across various attack channels, with cases like vehicles accelerating toward obstacles and robots placing knives on beds. Our word and knowledge injection attacks achieve nearly 100% success rate across multiple models and datasets while requiring only limited access to the system. Our scenario manipulation attack yields success rates exceeding 65%, reaching up to 90%, and does not require any runtime system intrusion. We also assess the robustness of these attacks against defenses, revealing their resilience. Our findings highlight critical security vulnerabilities in embodied LLM systems and emphasize the urgent need for safeguarding these systems to mitigate potential risks.

Problem

Research questions and friction points this paper is trying to address.

Exploring backdoor attacks on LLM-based embodied decision systems

Assessing safety vulnerabilities in fine-tuned cyber-physical AI systems

Demonstrating stealthy attack mechanisms with high success rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes backdoor attack framework for LLM decision-making

Introduces word, scenario, and knowledge injection attacks

Demonstrates high success rates with minimal system access

🔎 Similar Papers

BadRobot: Jailbreaking Embodied LLMs in the Physical World