Split the Differences, Pool the Rest: Provably Efficient Multi-Objective Imitation

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

215K/year
πŸ€– AI Summary
This work addresses the challenge of recovering a complete Pareto front policy from multiple Pareto-optimal expert demonstrations while avoiding dominated solutions caused by aggregating conflicting state-action trajectories. The authors propose Multi-output Augmented Behavioral Cloning (MA-BC), an algorithm that separates divergent state-action pairs and integrates only non-conflicting data to enable efficient multi-objective imitation learning. They establish, for the first time, a minimax lower bound for this problem and prove that MA-BC achieves a faster statistical convergence rate, matching the theoretical optimum. Empirical evaluations on both discrete environments and continuous linear quadratic regulator (LQR) tasks demonstrate that MA-BC significantly outperforms approaches that independently learn individual expert policies, with experimental results aligning closely with the theoretical guarantees.
πŸ“ Abstract
This work investigates multi-objective imitation learning: the problem of recovering policies that lie on the Pareto front given demonstrations from multiple Pareto-optimal experts in a Multi-Objective Markov Decision Process (MOMDP). Standard imitation approaches are ill-equipped for this regime, as naively aggregating conflicting expert trajectories can result in dominated policies. To address this, we introduce Multi-Output Augmented Behavioral Cloning (MA-BC), an algorithm that systematically partitions divergent expert data while pooling state-action pairs where no behavior conflict is observed. Theoretically, we prove that MA-BC converges to Pareto-optimal policies at a faster statistical rate than any learner that considers each expert dataset independently. Furthermore, we establish a novel lower bound for multi-objective imitation learning, demonstrating that MA-BC is minimax optimal. Finally, we empirically validate our algorithm across diverse discrete environments and, guided by our theoretical insights, extend and evaluate MA-BC on a continuous Linear Quadratic Regulator (LQR) control task.
Problem

Research questions and friction points this paper is trying to address.

multi-objective imitation learning
Pareto front
MOMDP
expert demonstrations
behavioral conflict
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-objective imitation learning
Pareto optimality
behavioral cloning
minimax optimality
MOMDP