🤖 AI Summary
Large language model (LLM)-driven autonomous driving systems suffer from insufficient reliability in detecting small-scale, safety-critical objects—such as traffic lights and signs—and lack explicit rule-based constraints. Method: This paper proposes TLS-Assist, a modular enhancement framework featuring a plug-and-play, model-agnostic redundancy-aware module that improves detection robustness via explicit attention guidance; it further converts structured perception outputs into natural language instructions injected into the LLM’s decision-making pipeline. The method integrates monocular/multi-view visual detection with natural language formatting and enables closed-loop control in CARLA. Results: On the LangAuto benchmark, TLS-Assist achieves a 14% improvement over LMDrive and outperforms BEVDriver by 7%, significantly reducing traffic signal violations. It is the first approach to explicitly model safety-critical fine-grained objects and inject interpretable, rule-based constraints across the perception–language–decision pipeline.
📝 Abstract
Large Language Models (LLMs) are increasingly used for decision-making and planning in autonomous driving, showing promising reasoning capabilities and potential to generalize across diverse traffic situations. However, current LLM-based driving agents lack explicit mechanisms to enforce traffic rules and often struggle to reliably detect small, safety-critical objects such as traffic lights and signs. To address this limitation, we introduce TLS-Assist, a modular redundancy layer that augments LLM-based autonomous driving agents with explicit traffic light and sign recognition. TLS-Assist converts detections into structured natural language messages that are injected into the LLM input, enforcing explicit attention to safety-critical cues. The framework is plug-and-play, model-agnostic, and supports both single-view and multi-view camera setups. We evaluate TLS-Assist in a closed-loop setup on the LangAuto benchmark in CARLA. The results demonstrate relative driving performance improvements of up to 14% over LMDrive and 7% over BEVDriver, while consistently reducing traffic light and sign infractions. We publicly release the code and models on https://github.com/iis-esslingen/TLS-Assist.