Quantifying Generalisation in Imitation Learning

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Existing imitation learning benchmarks lack sufficient distributional shift between training and evaluation, hindering meaningful generalization assessment. This paper introduces Labyrinth: a discrete, fully observable environment with controllable structure, precisely adjustable start/goal positions, and task complexity, supporting optimal action labeling and deterministic environment generation. Its key innovation is the first systematic, orthogonal control over generalization dimensions—such as partial observability, key-door tasks, and icy-surface slipperiness—as out-of-distribution challenges, with strict separation of training, validation, and test sets. This framework significantly improves experimental reproducibility and result interpretability. Empirical evaluation across multiple baselines effectively discriminates algorithmic generalization capabilities. Labyrinth establishes the first standardized, verifiable benchmark for robustness assessment in imitation learning.

Technology Category

Application Category

📝 Abstract

Imitation learning benchmarks often lack sufficient variation between training and evaluation, limiting meaningful generalisation assessment. We introduce Labyrinth, a benchmarking environment designed to test generalisation with precise control over structure, start and goal positions, and task complexity. It enables verifiably distinct training, evaluation, and test settings. Labyrinth provides a discrete, fully observable state space and known optimal actions, supporting interpretability and fine-grained evaluation. Its flexible setup allows targeted testing of generalisation factors and includes variants like partial observability, key-and-door tasks, and ice-floor hazards. By enabling controlled, reproducible experiments, Labyrinth advances the evaluation of generalisation in imitation learning and provides a valuable tool for developing more robust agents.

Problem

Research questions and friction points this paper is trying to address.

Assessing imitation learning generalization with controlled environment variations

Providing verifiable training and testing settings for robust evaluation

Enabling targeted testing of generalization factors and environmental hazards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Labyrinth benchmarking environment tests imitation learning generalization

Provides discrete observable state space with optimal actions

Enables controlled reproducible experiments with flexible task variants

🔎 Similar Papers

No similar papers found.