🤖 AI Summary
This work addresses the challenge of efficiently and provably computing the complete Pareto-optimal policy set in multi-objective reinforcement learning. We propose Iterative Pareto Reference Optimization (IPRO), a novel framework that decomposes Pareto front approximation into a sequence of constrained single-objective optimization subproblems. IPRO is the first method of its kind to provide theoretical convergence guarantees and, at each iteration, delivers an upper bound on the distance to undiscovered Pareto-optimal solutions. It requires no prior preference information or convexity assumptions and is compatible with arbitrary single-objective solvers. Empirical evaluation on standard benchmarks demonstrates that IPRO achieves state-of-the-art performance in both hypervolume and utility-based metrics. Moreover, we validate its generalizability beyond RL—e.g., to path planning—confirming robustness across diverse multi-objective optimization tasks.
📝 Abstract
An important challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies to attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), which decomposes finding the Pareto front into a sequence of constrained single-objective problems. This enables us to guarantee convergence while providing an upper bound on the distance to undiscovered Pareto optimal solutions at each step. We evaluate IPRO using utility-based metrics and its hypervolume and find that it matches or outperforms methods that require additional assumptions. By leveraging problem-specific single-objective solvers, our approach also holds promise for applications beyond multi-objective reinforcement learning, such as planning and pathfinding.