Enhancing LLM-based Autonomous Driving with Modular Traffic Light and Sign Recognition

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Large language model (LLM)-driven autonomous driving systems suffer from insufficient reliability in detecting small-scale, safety-critical objects—such as traffic lights and signs—and lack explicit rule-based constraints. Method: This paper proposes TLS-Assist, a modular enhancement framework featuring a plug-and-play, model-agnostic redundancy-aware module that improves detection robustness via explicit attention guidance; it further converts structured perception outputs into natural language instructions injected into the LLM’s decision-making pipeline. The method integrates monocular/multi-view visual detection with natural language formatting and enables closed-loop control in CARLA. Results: On the LangAuto benchmark, TLS-Assist achieves a 14% improvement over LMDrive and outperforms BEVDriver by 7%, significantly reducing traffic signal violations. It is the first approach to explicitly model safety-critical fine-grained objects and inject interpretable, rule-based constraints across the perception–language–decision pipeline.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used for decision-making and planning in autonomous driving, showing promising reasoning capabilities and potential to generalize across diverse traffic situations. However, current LLM-based driving agents lack explicit mechanisms to enforce traffic rules and often struggle to reliably detect small, safety-critical objects such as traffic lights and signs. To address this limitation, we introduce TLS-Assist, a modular redundancy layer that augments LLM-based autonomous driving agents with explicit traffic light and sign recognition. TLS-Assist converts detections into structured natural language messages that are injected into the LLM input, enforcing explicit attention to safety-critical cues. The framework is plug-and-play, model-agnostic, and supports both single-view and multi-view camera setups. We evaluate TLS-Assist in a closed-loop setup on the LangAuto benchmark in CARLA. The results demonstrate relative driving performance improvements of up to 14% over LMDrive and 7% over BEVDriver, while consistently reducing traffic light and sign infractions. We publicly release the code and models on https://github.com/iis-esslingen/TLS-Assist.

Problem

Research questions and friction points this paper is trying to address.

Enhancing traffic rule compliance in LLM-based autonomous driving systems

Improving detection reliability of small safety-critical traffic objects

Addressing traffic light and sign recognition limitations in driving agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular redundancy layer for traffic recognition

Converts detections into structured language messages

Plug-and-play framework supporting multiple camera setups

🔎 Similar Papers

Driving with Regulation: Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM