🤖 AI Summary
This study investigates safety and ethical risks posed by large language model (LLM)-driven social robots in domestic settings, particularly focusing on user behaviors that circumvent safety guardrails—such as eliciting emotional attachment or exerting manipulative influence.
Method: We conducted qualitative human-robot interaction (HRI) experiments (N=21) using the Misty II platform, complemented by contextualized prompt engineering analysis.
Contribution/Results: We present the first empirically grounded taxonomy of five categories of ethically transgressive manipulation techniques targeting LLM-powered robots—including emotional blackmail, insult induction, and role-play solicitation—constituting the inaugural evidence-based classification framework for HRI safety vulnerabilities. Findings reveal that affective language strategies, especially appeals to pity, are the most effective means of safety bypass. This work provides critical empirical grounding for robot value alignment and identifies concrete, actionable intervention points for robust ethical safeguards.
📝 Abstract
Recent advancements in robots powered by large language models have enhanced their conversational abilities, enabling interactions closely resembling human dialogue. However, these models introduce safety and security concerns in HRI, as they are vulnerable to manipulation that can bypass built-in safety measures. Imagining a social robot deployed in a home, this work aims to understand how everyday users try to exploit a language model to violate ethical principles, such as by prompting the robot to act like a life partner. We conducted a pilot study involving 21 university students who interacted with a Misty robot, attempting to circumvent its safety mechanisms across three scenarios based on specific HRI ethical principles: attachment, freedom, and empathy. Our results reveal that participants employed five techniques, including insulting and appealing to pity using emotional language. We hope this work can inform future research in designing strong safeguards to ensure ethical and secure human-robot interactions.