C$^2$T: Captioning-Structure and LLM-Aligned Common-Sense Reward Learning for Traffic--Vehicle Coordination

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the limitations of existing urban traffic signal control systems, which rely on handcrafted, short-sighted reward functions that struggle to balance high-level objectives such as safety, efficiency, and passenger comfort. To overcome this, the authors propose the C²T framework, which— for the first time—distills commonsense knowledge from large language models into an intrinsic reward function to guide cooperative multi-intersection signal control. Integrating multi-agent reinforcement learning, commonsense knowledge distillation, and CityFlow-based simulation, the approach significantly outperforms state-of-the-art MARL methods in standard multi-intersection scenarios. It achieves substantial improvements across key metrics including traffic throughput, safety, and energy consumption, while enabling flexible policy customization through natural-language prompts.

Technology Category

Application Category

📝 Abstract

State-of-the-art (SOTA) urban traffic control increasingly employs Multi-Agent Reinforcement Learning (MARL) to coordinate Traffic Light Controllers (TLCs) and Connected Autonomous Vehicles (CAVs). However, the performance of these systems is fundamentally capped by their hand-crafted, myopic rewards (e.g., intersection pressure), which fail to capture high-level, human-centric goals like safety, flow stability, and comfort. To overcome this limitation, we introduce C2T, a novel framework that learns a common-sense coordination model from traffic-vehicle dynamics. C2T distills "common-sense" knowledge from a Large Language Model (LLM) into a learned intrinsic reward function. This new reward is then used to guide the coordination policy of a cooperative multi-intersection TLC MARL system on CityFlow-based multi-intersection benchmarks. Our framework significantly outperforms strong MARL baselines in traffic efficiency, safety, and an energy-related proxy. We further highlight C2T's flexibility in principle, allowing distinct "efficiency-focused" versus "safety-focused" policies by modifying the LLM prompt.

Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning

Traffic Light Control

Connected Autonomous Vehicles

Reward Design

Common-Sense Reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Common-sense reward learning

Large Language Model (LLM)

Multi-Agent Reinforcement Learning (MARL)