🤖 AI Summary
This study addresses the limitations of existing general-purpose safety evaluation frameworks in identifying domain-specific risks posed by large language model (LLM) assistants in driving scenarios, which can lead to safety, ethical, and legal concerns. To bridge this gap, the authors propose a hierarchical risk taxonomy encompassing technical, legal, social, and ethical dimensions, introducing 129 expert-validated, fine-grained atomic risks. Building upon real-world driving regulations and safety principles, they design a structured prompt set to evaluate the refusal behaviors of mainstream LLMs. Empirical results reveal that current models frequently fail to adequately reject unsafe or regulation-violating driving-related queries, highlighting a critical deficiency in generic safety alignment mechanisms when applied to the driving context.
📝 Abstract
Large Language Models (LLMs) are increasingly integrated into vehicle-based digital assistants, where unsafe, ambiguous, or legally incorrect responses can lead to serious safety, ethical, and regulatory consequences. Despite growing interest in LLM safety, existing taxonomies and evaluation frameworks remain largely general-purpose and fail to capture the domain-specific risks inherent to real-world driving scenarios. In this paper, we introduce DriveSafe, a hierarchical, four-level risk taxonomy designed to systematically characterize safety-critical failure modes of LLM-based driving assistants. The taxonomy comprises 129 fine-grained atomic risk categories spanning technical, legal, societal, and ethical dimensions, grounded in real-world driving regulations and safety principles and reviewed by domain experts. To validate the safety relevance and realism of the constructed prompts, we evaluate their refusal behavior across six widely deployed LLMs. Our analysis shows that the evaluated models often fail to appropriately refuse unsafe or non-compliant driving-related queries, underscoring the limitations of general-purpose safety alignment in driving contexts.