PRISM: Parallel Reward Integration with Symmetry for MORL

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenges of sparse rewards, credit assignment difficulties, and low sample efficiency in heterogeneous multi-objective reinforcement learning (MORL) caused by disparities in reward time scales. To tackle these issues, the authors propose ReSymNet, the first MORL framework to incorporate reflection symmetry as an inductive bias. The model employs a residual architecture to learn scaled opportunity values, accelerating exploration, and introduces a reflection-equivariant regularization (SymReg) to reduce hypothesis space complexity. Evaluated on MuJoCo benchmarks, ReSymNet improves the hypervolume metric by over 100% compared to sparse-reward baselines and achieves 32% of the performance of a full dense-reward oracle, significantly enhancing both coverage and distributional uniformity of the Pareto front.

Technology Category

Application Category

📝 Abstract

This work studies heterogeneous Multi-Objective Reinforcement Learning (MORL), where objectives can differ sharply in temporal frequency. Such heterogeneity allows dense objectives to dominate learning, while sparse long-horizon rewards receive weak credit assignment, leading to poor sample efficiency. We propose a Parallel Reward Integration with Symmetry (PRISM) algorithm that enforces reflectional symmetry as an inductive bias in aligning reward channels. PRISM introduces ReSymNet, a theory-motivated model that reconciles temporal-frequency mismatches across objectives, using residual blocks to learn a scaled opportunity value that accelerates exploration while preserving the optimal policy. We also propose SymReg, a reflectional equivariance regulariser that enforces agent mirroring and constrains policy search to a reflection-equivariant subspace. This restriction provably reduces hypothesis complexity and improves generalisation. Across MuJoCo benchmarks, PRISM consistently outperforms both a sparse-reward baseline and an oracle trained with full dense rewards, improving Pareto coverage and distributional balance: it achieves hypervolume gains exceeding 100\% over the baseline and up to 32\% over the oracle. The code is at \href{https://github.com/EVIEHub/PRISM}{https://github.com/EVIEHub/PRISM}.

Problem

Research questions and friction points this paper is trying to address.

Multi-Objective Reinforcement Learning

heterogeneous objectives

temporal frequency mismatch

credit assignment

sample efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Objective Reinforcement Learning

Reflectional Symmetry

Temporal Frequency Mismatch