TADPO: Reinforcement Learning Goes Off-road

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of long-horizon planning and adaptive control in mapless, dynamic off-road environments, as well as the difficulty of training reinforcement learning agents under sparse rewards. To this end, the authors propose TADPO, a novel policy gradient framework that integrates off-policy teacher guidance with on-policy student exploration. Built upon Proximal Policy Optimization (PPO), TADPO forms an end-to-end vision-based reinforcement learning system that leverages high-fidelity simulation and a hybrid training mechanism. Notably, it achieves zero-shot sim-to-real transfer for full-scale off-road vehicles for the first time. Experimental results demonstrate that the system efficiently navigates extreme slopes and complex obstacles in simulation and exhibits strong generalization and robustness when deployed on a real-world off-road vehicle.

Technology Category

Application Category

📝 Abstract
Off-road autonomous driving poses significant challenges such as navigating unmapped, variable terrain with uncertain and diverse dynamics. Addressing these challenges requires effective long-horizon planning and adaptable control. Reinforcement Learning (RL) offers a promising solution by learning control policies directly from interaction. However, because off-road driving is a long-horizon task with low-signal rewards, standard RL methods are challenging to apply in this setting. We introduce TADPO, a novel policy gradient formulation that extends Proximal Policy Optimization (PPO), leveraging off-policy trajectories for teacher guidance and on-policy trajectories for student exploration. Building on this, we develop a vision-based, end-to-end RL system for high-speed off-road driving, capable of navigating extreme slopes and obstacle-rich terrain. We demonstrate our performance in simulation and, importantly, zero-shot sim-to-real transfer on a full-scale off-road vehicle. To our knowledge, this work represents the first deployment of RL-based policies on a full-scale off-road platform.
Problem

Research questions and friction points this paper is trying to address.

off-road autonomous driving
long-horizon planning
reinforcement learning
sparse rewards
unstructured terrain
Innovation

Methods, ideas, or system contributions that make the work stand out.

TADPO
off-road autonomous driving
reinforcement learning
sim-to-real transfer
policy gradient
🔎 Similar Papers
No similar papers found.
Z
Zhouchonghao Wu
Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213
R
Raymond Song
Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213
V
Vedant Mundheda
Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213
L
Luis E. Navarro-Serment
National Robotics Engineering Center, Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15201
C
Christof Schoenborn
National Robotics Engineering Center, Robotics Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15201
Jeff Schneider
Jeff Schneider
Carnegie Mellon University
Machine Learning