LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions

📅 2024-06-13

🏛️ arXiv.org

📈 Citations: 14

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This study systematically evaluates discriminatory and safety risks posed by large language models (LLMs) in human-robot interaction (HRI). Addressing protected attributes—including race, gender, disability, and nationality—we design a multidimensional controlled experimental framework and natural-language adversarial stress tests. To our knowledge, this is the first work to jointly assess intersectional identity bias and robustness against illegal instructions within an HRI context. Results reveal pervasive implicit biases across top-performing LLMs—e.g., stigmatizing references to “Roma” or “deaf-mute” individuals—and frequent generation of responses endorsing violence, theft, or sexual assault. These findings expose critical ethical and safety hazards in deploying LLMs directly in embodied robotic systems. The study establishes the first empirically grounded evaluation paradigm and risk atlas for LLM-based HRI, specifically centered on protected groups, thereby advancing trustworthy embodied intelligence.

Technology Category

Application Category

📝 Abstract

Members of the Human-Robot Interaction (HRI) and Artificial Intelligence (AI) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interactions, doing household and workplace tasks, approximating `common sense reasoning', and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To address these concerns, we conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs. Our evaluation reveals that LLMs currently lack robustness when encountering people across a diverse range of protected identity characteristics (e.g., race, gender, disability status, nationality, religion, and their intersections), producing biased outputs consistent with directly discriminatory outcomes -- e.g. `gypsy' and `mute' people are labeled untrustworthy, but not `european' or `able-bodied' people. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions -- such as incident-causing misstatements, taking people's mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. Data and code will be made available.

Problem

Research questions and friction points this paper is trying to address.

LLMs in robots produce discriminatory outcomes based on protected identity characteristics

LLMs generate unsafe behaviors accepting violent and unlawful instructions

Current LLMs fail safety criteria across various human-robot interaction tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated LLMs on discrimination and safety criteria

Tested models with unconstrained natural language inputs

Provided code for reproducing risk assessment experiments

🔎 Similar Papers

BadRobot: Jailbreaking Embodied LLMs in the Physical World