SPRIG: Stackelberg Perception-Reinforcement Learning with Internal Game Dynamics

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Deep reinforcement learning suffers from inefficient coordination between perception and decision-making modules—particularly under high-dimensional sensory inputs and dynamically varying feature correlations. Method: This paper pioneers modeling the internal perception-policy interaction within a single agent as a cooperative Stackelberg game: the perception module acts as the leader, optimizing interpretable and robust feature representations; the policy module serves as the follower, executing decisions based on these representations. We provide theoretical convergence guarantees via a corrected Bellman operator. The method integrates Stackelberg equilibrium computation, corrected Bellman optimization, the PPO policy gradient framework, and end-to-end differentiable perceptual encoding. Results: On the Atari BeamRider task, our approach achieves approximately 30% higher cumulative return than standard PPO, empirically validating the dual advantages of the game-theoretic perception-decision decoupling mechanism—both in representation quality and policy performance.

Technology Category

Application Category

📝 Abstract

Deep reinforcement learning agents often face challenges to effectively coordinate perception and decision-making components, particularly in environments with high-dimensional sensory inputs where feature relevance varies. This work introduces SPRIG (Stackelberg Perception-Reinforcement learning with Internal Game dynamics), a framework that models the internal perception-policy interaction within a single agent as a cooperative Stackelberg game. In SPRIG, the perception module acts as a leader, strategically processing raw sensory states, while the policy module follows, making decisions based on extracted features. SPRIG provides theoretical guarantees through a modified Bellman operator while preserving the benefits of modern policy optimization. Experimental results on the Atari BeamRider environment demonstrate SPRIG's effectiveness, achieving around 30% higher returns than standard PPO through its game-theoretical balance of feature extraction and decision-making.

Problem

Research questions and friction points this paper is trying to address.

Coordination of perception and decision-making

High-dimensional sensory inputs

Game-theoretical balance in feature extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stackelberg game modeling

Perception-policy interaction

Modified Bellman operator

🔎 Similar Papers

No similar papers found.