🤖 AI Summary
To address insufficient safety assurance and interpretability in human-vehicle-environment interactions within electric vehicle (EV)–smart grid integration, this paper proposes the first multimodal large language model (MLLM) framework tailored for vehicle-to-grid (V2G) scenarios. The framework integrates YOLOv8-based visual detection, semantic segmentation, CAN bus sensor data, and high-precision geolocation inputs, enabling end-to-end generation of natural-language safety alerts via customized prompt engineering. Innovatively embedding a large language model into the V2G real-time decision pipeline, it supports context-aware driver warnings—e.g., imminent pedestrian or non-motorized vehicle risks—thereby enhancing transparency and responsiveness in human-vehicle collaborative decision-making. Evaluated on a real-world urban road dataset, the system consistently produces highly relevant, actionable alerts and demonstrates scalability to fleet-level traffic–energy co-optimization.
📝 Abstract
The integration of electric vehicles (EVs) into smart grids presents unique opportunities to enhance both transportation systems and energy networks. However, ensuring safe and interpretable interactions between drivers, vehicles, and the surrounding environment remains a critical challenge. This paper presents a multi-modal large language model (LLM)-based framework to process multimodal sensor data - such as object detection, semantic segmentation, and vehicular telemetry - and generate natural-language alerts for drivers. The framework is validated using real-world data collected from instrumented vehicles driving on urban roads, ensuring its applicability to real-world scenarios. By combining visual perception (YOLOv8), geocoded positioning, and CAN bus telemetry, the framework bridges raw sensor data and driver comprehension, enabling safer and more informed decision-making in urban driving scenarios. Case studies using real data demonstrate the framework's effectiveness in generating context-aware alerts for critical situations, such as proximity to pedestrians, cyclists, and other vehicles. This paper highlights the potential of LLMs as assistive tools in e-mobility, benefiting both transportation systems and electric networks by enabling scalable fleet coordination, EV load forecasting, and traffic-aware energy planning.
Index Terms - Electric vehicles, visual perception, large language models, YOLOv8, semantic segmentation, CAN bus, prompt engineering, smart grid.