🤖 AI Summary
This work addresses the instability of multi-objective reinforcement learning in regions with non-convex Pareto fronts, where linear scalarization fails to capture non-convex solutions and static nonlinear approaches like Tchebycheff suffer from high gradient variance, leading to training difficulties. To overcome these limitations, the authors propose an adaptive smoothed Tchebycheff framework that dynamically adjusts the curvature of the scalarization function based on gradient conflict awareness—approximating exact non-convex solutions when objectives are aligned and switching to smoother optimization under severe conflict. Integrating an attention mechanism with a conflict-driven adaptive controller, this approach achieves stable and efficient optimization with nonlinear scalarization in non-convex regions for the first time. Experiments on a robotic covert visual search task demonstrate its ability to discover Pareto-optimal policies inaccessible to linear and static nonlinear methods, significantly improving multi-objective trade-off performance.
📝 Abstract
Multi-objective reinforcement learning in robotic domains requires balancing complex, non-convex trade-offs between conflicting objectives. While linear scalarization methods provide stability, they are theoretically incapable of recovering solutions within non-convex regions of the Pareto front. Conversely, static non-linear scalarizations (e.g., Tchebycheff) can theoretically access these regions but often suffer from severe gradient variance and optimization instability in deep RL. In this work, we propose an Adaptive Smooth Tchebycheff framework that resolves this tension by dynamically modulating the curvature of the optimization landscape. We introduce a novel conflict-driven controller that regulates the optimization smoothness based on real-time gradient interference. This allows the agent to anneal toward precise, non-convex scalarization when objectives align, while elastically reverting to stable, smooth approximations when destructive gradient conflicts emerge. We validate our approach on a challenging robotic stealth visual search task -- a proxy for monitoring of protected/fragile ecosystems -- where an agent must balance search, exposure/interference minimization and exploration speed. Extensive ablations confirm that our conflict-aware adaptation enables the robust discovery of Pareto-optimal policies in non-convex regions inaccessible to linear baselines and unstable for static non-linear methods.
Website: https://alejandromllo.github.io/research/pasta/