Automating eHMI Action Design with LLMs for Automated Vehicle Communication

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Autonomous vehicles (AVs) lack explicit communication channels with pedestrians and cyclists; existing external human–machine interfaces (eHMIs) rely on predefined text or manually designed visual actions, limiting adaptability to dynamic traffic scenarios. This paper proposes the first large language model (LLM)-driven eHMI action generation framework that maps natural-language messages end-to-end to executable visual behavior sequences. We introduce the first user-annotated Action-Design Scoring dataset and design a dual automated evaluation pipeline comprising Action Reasonableness Scoring (ARS) and vision-language model (VLM)-based assessment to analyze LLMs’ action generation capability and modality dependencies. Experiments span eight interaction message types across four eHMI modalities, yielding 320 generated action sequences. Results show that reasoning-oriented LLMs produce actions approaching human-level quality, and VLM-based automatic scoring achieves high agreement with human preferences (Spearman’s ρ = 0.92).

Technology Category

Application Category

📝 Abstract

The absence of explicit communication channels between automated vehicles (AVs) and other road users requires the use of external Human-Machine Interfaces (eHMIs) to convey messages effectively in uncertain scenarios. Currently, most eHMI studies employ predefined text messages and manually designed actions to perform these messages, which limits the real-world deployment of eHMIs, where adaptability in dynamic scenarios is essential. Given the generalizability and versatility of large language models (LLMs), they could potentially serve as automated action designers for the message-action design task. To validate this idea, we make three contributions: (1) We propose a pipeline that integrates LLMs and 3D renderers, using LLMs as action designers to generate executable actions for controlling eHMIs and rendering action clips. (2) We collect a user-rated Action-Design Scoring dataset comprising a total of 320 action sequences for eight intended messages and four representative eHMI modalities. The dataset validates that LLMs can translate intended messages into actions close to a human level, particularly for reasoning-enabled LLMs. (3) We introduce two automated raters, Action Reference Score (ARS) and Vision-Language Models (VLMs), to benchmark 18 LLMs, finding that the VLM aligns with human preferences yet varies across eHMI modalities.

Problem

Research questions and friction points this paper is trying to address.

Lack of explicit AV communication requires adaptable eHMIs

Current eHMI designs limit real-world dynamic scenario deployment

LLMs can automate eHMI action design for better adaptability

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs automate eHMI action design

Pipeline integrates LLMs with 3D renderers

Vision-Language Models benchmark LLM performance

🔎 Similar Papers

REvolve: Reward Evolution with Large Language Models using Human Feedback