Adapting a World Model for Trajectory Following in a 3D Game

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work investigates trajectory imitation learning in the complex 3D game *Bleeding Edge*, addressing robustness challenges arising from environmental stochasticity and train-deploy distribution shift. To mitigate uncertainty bias in inverse dynamics models (IDMs), we propose a multi-scale future alignment strategy. We systematically evaluate combinations of visual encoders—DINOv2 versus from-scratch trained—and policy heads—GPT-style (autoregressive) versus MLP-style (feedforward). Results show that, with sufficient data, a self-trained encoder paired with a GPT-style head achieves optimal performance; under low-data regimes, DINOv2 with a GPT-style head is superior; and under pretraining-fine-tuning, both head types converge in performance. Additionally, we introduce a trajectory deviation quantification metric, enabling fine-grained robustness analysis for imitation learning. Our study provides principled insights into architecture selection and evaluation methodology for robust behavioral cloning in stochastic 3D environments.

Technology Category

Application Category

📝 Abstract

Imitation learning is a powerful tool for training agents by leveraging expert knowledge, and being able to replicate a given trajectory is an integral part of it. In complex environments, like modern 3D video games, distribution shift and stochasticity necessitate robust approaches beyond simple action replay. In this study, we apply Inverse Dynamics Models (IDM) with different encoders and policy heads to trajectory following in a modern 3D video game -- Bleeding Edge. Additionally, we investigate several future alignment strategies that address the distribution shift caused by the aleatoric uncertainty and imperfections of the agent. We measure both the trajectory deviation distance and the first significant deviation point between the reference and the agent's trajectory and show that the optimal configuration depends on the chosen setting. Our results show that in a diverse data setting, a GPT-style policy head with an encoder trained from scratch performs the best, DINOv2 encoder with the GPT-style policy head gives the best results in the low data regime, and both GPT-style and MLP-style policy heads had comparable results when pre-trained on a diverse setting and fine-tuned for a specific behaviour setting.

Problem

Research questions and friction points this paper is trying to address.

Adapting World Model for 3D game trajectory following

Addressing distribution shift in imitation learning

Evaluating encoder and policy head performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inverse Dynamics Models for trajectory following

GPT-style policy head with diverse data

DINOv2 encoder for low data regime

🔎 Similar Papers

No similar papers found.