🤖 AI Summary
This work addresses the challenges of sparse rewards, credit assignment difficulties, and low sample efficiency in heterogeneous multi-objective reinforcement learning (MORL) caused by disparities in reward time scales. To tackle these issues, the authors propose ReSymNet, the first MORL framework to incorporate reflection symmetry as an inductive bias. The model employs a residual architecture to learn scaled opportunity values, accelerating exploration, and introduces a reflection-equivariant regularization (SymReg) to reduce hypothesis space complexity. Evaluated on MuJoCo benchmarks, ReSymNet improves the hypervolume metric by over 100% compared to sparse-reward baselines and achieves 32% of the performance of a full dense-reward oracle, significantly enhancing both coverage and distributional uniformity of the Pareto front.
📝 Abstract
This work studies heterogeneous Multi-Objective Reinforcement Learning (MORL), where objectives can differ sharply in temporal frequency. Such heterogeneity allows dense objectives to dominate learning, while sparse long-horizon rewards receive weak credit assignment, leading to poor sample efficiency. We propose a Parallel Reward Integration with Symmetry (PRISM) algorithm that enforces reflectional symmetry as an inductive bias in aligning reward channels. PRISM introduces ReSymNet, a theory-motivated model that reconciles temporal-frequency mismatches across objectives, using residual blocks to learn a scaled opportunity value that accelerates exploration while preserving the optimal policy. We also propose SymReg, a reflectional equivariance regulariser that enforces agent mirroring and constrains policy search to a reflection-equivariant subspace. This restriction provably reduces hypothesis complexity and improves generalisation. Across MuJoCo benchmarks, PRISM consistently outperforms both a sparse-reward baseline and an oracle trained with full dense rewards, improving Pareto coverage and distributional balance: it achieves hypervolume gains exceeding 100\% over the baseline and up to 32\% over the oracle. The code is at \href{https://github.com/EVIEHub/PRISM}{https://github.com/EVIEHub/PRISM}.