🤖 AI Summary
This work addresses tactical decision-making for autonomous trucks in highway scenarios. We propose a total cost of operation (TCOP)-driven deep reinforcement learning framework, decoupling high-level decision policies from low-level physical-model-based controllers and employing the Proximal Policy Optimization (PPO) algorithm. To our knowledge, this is the first study to formulate TCOP as an end-to-end multi-objective reward function, explicitly balancing fuel consumption (economy), collision rate (safety), and traffic throughput (efficiency). We further integrate adaptive reward weighting, component-wise normalization, and curriculum learning to jointly optimize these objectives. Experimental results in high-fidelity simulation demonstrate that the proposed framework reduces fuel consumption by 8.2% and collision rate by 91% compared to sparse-reward baselines, while significantly improving generalization and real-world deployability. These findings validate the effectiveness and engineering practicality of TCOP-guided reward design for commercial vehicle autonomous driving.
📝 Abstract
We develop a deep reinforcement learning framework for tactical decision making in an autonomous truck, specifically for Adaptive Cruise Control (ACC) and lane change maneuvers in a highway scenario. Our results demonstrate that it is beneficial to separate high-level decision-making processes and low-level control actions between the reinforcement learning agent and the low-level controllers based on physical models. In the following, we study optimizing the performance with a realistic and multi-objective reward function based on Total Cost of Operation (TCOP) of the truck using different approaches; by adding weights to reward components, by normalizing the reward components and by using curriculum learning techniques.