🤖 AI Summary
Current ADAS systems lack scene-context understanding and natural-language interaction capabilities, limiting their adaptability to dynamic environments and driver intent. To address this, we propose a generative-AI–driven conversational ADAS framework that jointly leverages vision-sensor perception and large language models (LLMs). Our modular architecture enables zero-shot, structured function calling and vision-to-text contextual modeling—supporting multi-turn, interpretable, and executable natural-language interactions without LLM fine-tuning. The system is integrated into the CARLA simulation platform, with real-time scene perception and ADAS command generation performed via cloud-based generative AI. Experimental evaluation demonstrates the feasibility of natural-language–driven assisted decision-making and uncovers a critical trade-off between visual-context retrieval latency and cumulative dialogue-history length. This work establishes a novel paradigm for explainable, adaptive next-generation ADAS systems.
📝 Abstract
While autonomous driving technologies continue to advance, current Advanced Driver Assistance Systems (ADAS) remain limited in their ability to interpret scene context or engage with drivers through natural language. These systems typically rely on predefined logic and lack support for dialogue-based interaction, making them inflexible in dynamic environments or when adapting to driver intent. This paper presents Scene-Aware Conversational ADAS (SC-ADAS), a modular framework that integrates Generative AI components including large language models, vision-to-text interpretation, and structured function calling to enable real-time, interpretable, and adaptive driver assistance. SC-ADAS supports multi-turn dialogue grounded in visual and sensor context, allowing natural language recommendations and driver-confirmed ADAS control. Implemented in the CARLA simulator with cloud-based Generative AI, the system executes confirmed user intents as structured ADAS commands without requiring model fine-tuning. We evaluate SC-ADAS across scene-aware, conversational, and revisited multi-turn interactions, highlighting trade-offs such as increased latency from vision-based context retrieval and token growth from accumulated dialogue history. These results demonstrate the feasibility of combining conversational reasoning, scene perception, and modular ADAS control to support the next generation of intelligent driver assistance.