🤖 AI Summary
Autonomous excavator deployment faces challenges including poor cross-scene generalization, high hardware adaptation costs, and highly engineered controllers requiring extensive manual tuning. Method: This paper introduces ExT—the first large-scale multi-task pretraining framework tailored for heavy construction machinery. ExT unifies modeling of excavation tasks and equipment heterogeneity, enabling rapid cross-task and cross-device transfer as well as continual learning. Leveraging large-scale expert demonstration data, it integrates supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT), augmented with simulation-to-reality (Sim2Real) transfer techniques. Results: ExT achieves centimeter-level accuracy in end-to-end excavation cycles both in simulation and on real excavators, matching the performance of specialized single-task controllers. Fine-tuning for new tasks requires only minimal data and preserves prior task performance, significantly enhancing system scalability and deployment efficiency.
📝 Abstract
Scaling up the deployment of autonomous excavators is of great economic and societal importance. Yet it remains a challenging problem, as effective systems must robustly handle unseen worksite conditions and new hardware configurations. Current state-of-the-art approaches rely on highly engineered, task-specific controllers, which require extensive manual tuning for each new scenario. In contrast, recent advances in large-scale pretrained models have shown remarkable adaptability across tasks and embodiments in domains such as manipulation and navigation, but their applicability to heavy construction machinery remains largely unexplored. In this work, we introduce ExT, a unified open-source framework for large-scale demonstration collection, pretraining, and fine-tuning of multitask excavation policies. ExT policies are first trained on large-scale demonstrations collected from a mix of experts, then fine-tuned either with supervised fine-tuning (SFT) or reinforcement learning fine-tuning (RLFT) to specialize to new tasks or operating conditions. Through both simulation and real-world experiments, we show that pretrained ExT policies can execute complete excavation cycles with centimeter-level accuracy, successfully transferring from simulation to real machine with performance comparable to specialized single-task controllers. Furthermore, in simulation, we demonstrate that ExT's fine-tuning pipelines allow rapid adaptation to new tasks, out-of-distribution conditions, and machine configurations, while maintaining strong performance on previously learned tasks. These results highlight the potential of ExT to serve as a foundation for scalable and generalizable autonomous excavation.