🤖 AI Summary
This work investigates trajectory imitation learning in the complex 3D game *Bleeding Edge*, addressing robustness challenges arising from environmental stochasticity and train-deploy distribution shift. To mitigate uncertainty bias in inverse dynamics models (IDMs), we propose a multi-scale future alignment strategy. We systematically evaluate combinations of visual encoders—DINOv2 versus from-scratch trained—and policy heads—GPT-style (autoregressive) versus MLP-style (feedforward). Results show that, with sufficient data, a self-trained encoder paired with a GPT-style head achieves optimal performance; under low-data regimes, DINOv2 with a GPT-style head is superior; and under pretraining-fine-tuning, both head types converge in performance. Additionally, we introduce a trajectory deviation quantification metric, enabling fine-grained robustness analysis for imitation learning. Our study provides principled insights into architecture selection and evaluation methodology for robust behavioral cloning in stochastic 3D environments.
📝 Abstract
Imitation learning is a powerful tool for training agents by leveraging expert knowledge, and being able to replicate a given trajectory is an integral part of it. In complex environments, like modern 3D video games, distribution shift and stochasticity necessitate robust approaches beyond simple action replay. In this study, we apply Inverse Dynamics Models (IDM) with different encoders and policy heads to trajectory following in a modern 3D video game -- Bleeding Edge. Additionally, we investigate several future alignment strategies that address the distribution shift caused by the aleatoric uncertainty and imperfections of the agent. We measure both the trajectory deviation distance and the first significant deviation point between the reference and the agent's trajectory and show that the optimal configuration depends on the chosen setting. Our results show that in a diverse data setting, a GPT-style policy head with an encoder trained from scratch performs the best, DINOv2 encoder with the GPT-style policy head gives the best results in the low data regime, and both GPT-style and MLP-style policy heads had comparable results when pre-trained on a diverse setting and fine-tuned for a specific behaviour setting.