Bi-level RL-Heuristic Optimization for Real-world Winter Road Maintenance

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses the inefficiency and heavy reliance on manual decision-making in real-world winter road maintenance operations by proposing a bi-level optimization framework. The upper level employs reinforcement learning to cluster the road network and allocate resources, while the lower level solves a multi-objective vehicle routing problem within each cluster, incorporating vehicle constraints, depot capacity, and segment-specific demand, with explicit consideration of maximum travel time and carbon emissions. To the best of our knowledge, this is the first integration of reinforcement learning with heuristic algorithms for large-scale, real-world maintenance scenarios, achieving both high efficiency and scalability. Validation on major UK road networks—including the M25, M6, and A1—demonstrates significant reductions in carbon emissions and operational costs, ensures that maximum travel time remains under two hours, and achieves balanced workload distribution across maintenance teams.

Technology Category

Application Category

📝 Abstract

Winter road maintenance is critical for ensuring public safety and reducing environmental impacts, yet existing methods struggle to manage large-scale routing problems effectively and mostly reply on human decision. This study presents a novel, scalable bi-level optimization framework, validated on real operational data on UK strategic road networks (M25, M6, A1), including interconnected local road networks in surrounding areas for vehicle traversing, as part of the highway operator's efforts to solve existing planning challenges. At the upper level, a reinforcement learning (RL) agent strategically partitions the road network into manageable clusters and optimally allocates resources from multiple depots. At the lower level, a multi-objective vehicle routing problem (VRP) is solved within each cluster, minimizing the maximum vehicle travel time and total carbon emissions. Unlike existing approaches, our method handles large-scale, real-world networks efficiently, explicitly incorporating vehicle-specific constraints, depot capacities, and road segment requirements. Results demonstrate significant improvements, including balanced workloads, reduced maximum travel times below the targeted two-hour threshold, lower emissions, and substantial cost savings. This study illustrates how advanced AI-driven bi-level optimization can directly enhance operational decision-making in real-world transportation and logistics.

Problem

Research questions and friction points this paper is trying to address.

winter road maintenance

large-scale routing

resource allocation

vehicle routing problem

operational decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

bi-level optimization

reinforcement learning

vehicle routing problem