Multi-Agent Reinforcement Learning for Safe Autonomous Driving Under Pedestrian Behavioral Uncertainty

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This study addresses the oversimplification of pedestrian models in existing autonomous driving simulators, which fails to capture the heterogeneity and uncertainty inherent in real-world jaywalking behaviors, thereby compromising safety evaluations. To tackle this limitation, the authors propose the first multi-agent reinforcement learning (MARL) framework that jointly trains an autonomous vehicle alongside twelve distinct pedestrian agents endowed with latent personality traits. Pedestrian high-level decisions are governed by a MAPPO policy, while low-level trajectories are generated via Dijkstra’s algorithm, augmented with a personality-driven stochastic jaywalking mechanism. The work introduces a novel speed-difference metric to quantify the vehicle’s reaction delay to jaywalking events, revealing that although such violations are rare, they predominantly cause collisions. Experiments demonstrate that the proposed approach achieves a 78% task success rate and reduces collision rates to 14% compared to rule-based baselines, and further lowers collisions by 30% relative to single-agent RL, validating MARL’s efficacy in enhancing interactive safety.

📝 Abstract

Simulation-based testing of self-driving cars (SDCs) typically relies on scripted or simplified pedestrian models that do not capture the heterogeneity and uncertainty of real human crossing behavior. This limits the realism of safety assessments, especially in scenarios involving jaywalking, which is governed by latent personality traits that the vehicle cannot observe. We hypothesize that jointly training pedestrians and the SDC with multi-agent reinforcement learning (MARL) produces more realistic interaction scenarios than training the SDC against fixed pedestrian policies, and that the resulting behavior gap between predictable and unpredictable crossings can be measured directly from trajectories. This paper describes a MARL environment in which an SDC and 12 pedestrians are co-trained using Multi-Agent Proximal Policy Optimization (MAPPO). Pedestrian locomotion follows scripted Dijkstra pathfinding, while an RL policy controls high-level go/wait decisions. Jaywalking probability depends on a per-pedestrian personality trait sampled at episode start and hidden from the SDC. In 500-episode evaluations, the co-trained SDC reached 78% of goals with a 14% collision rate, compared to 35% goals and 33% collisions for the best rule-based baseline. A speed differential metric shows that the SDC traveled 2.65 m/s faster near jaywalkers than near crosswalk users at close range (0-3 m), indicating that jaywalking encounters were not anticipated. Jaywalking accounted for 13% of crossing events but was associated with 62% of collisions. Co-training with MARL pedestrians reduced collisions by 30% relative to single-agent RL, as pedestrians learned to wait when the SDC approached at speed.

Problem

Research questions and friction points this paper is trying to address.

pedestrian behavioral uncertainty

autonomous driving safety

jaywalking

multi-agent reinforcement learning

simulation-based testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning

Pedestrian Behavioral Uncertainty

Autonomous Driving Safety