🤖 AI Summary
Urban bus network design is an NP-hard problem, and conventional manual planning struggles with large-scale solution spaces. This paper proposes the first end-to-end reinforcement learning (RL) framework for automated bus network design, leveraging a Graph Attention Network (GAT) to generate routes sequentially. It introduces a novel two-level reward mechanism—integrating topological incremental feedback with simulation-based terminal rewards—to effectively address long-horizon credit assignment. The framework combines the Proximal Policy Optimization (PPO) algorithm with the Multi-Agent Transport Simulation (MATSim) platform and employs census-driven demand modeling. It is the first RL-based approach validated at real-city scale (Bloomington, IN). Empirical results show that the RL-designed network achieves a 25.6% higher service coverage, a 30.9% reduction in average waiting time, and a 21.0% improvement in vehicle utilization compared to the existing real-world network; against state-of-the-art heuristic methods, it delivers a 68.8% gain in routing efficiency.
📝 Abstract
Designing efficient transit route networks is an NP-hard problem with exponentially large solution spaces that traditionally relies on manual planning processes. We present an end-to-end reinforcement learning (RL) framework based on graph attention networks for sequential transit network construction. To address the long-horizon credit assignment challenge, we introduce a two-level reward structure combining incremental topological feedback with simulation-based terminal rewards. We evaluate our approach on a new real-world dataset from Bloomington, Indiana with topologically accurate road networks, census-derived demand, and existing transit routes. Our learned policies substantially outperform existing designs and traditional heuristics across two initialization schemes and two modal-split scenarios. Under high transit adoption with transit center initialization, our approach achieves 25.6% higher service rates, 30.9% shorter wait times, and 21.0% better bus utilization compared to the real-world network. Under mixed-mode conditions with random initialization, it delivers 68.8% higher route efficiency than demand coverage heuristics and 5.9% lower travel times than shortest path construction. These results demonstrate that end-to-end RL can design transit networks that substantially outperform both human-designed systems and hand-crafted heuristics on realistic city-scale benchmarks.