Real-World Reinforcement Learning of Active Perception Behaviors

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Robots struggle to autonomously perform active perception for acquiring task-critical states in partially observable environments. This paper proposes Asymmetric Advantage-Weighted Regression (AAWR), a novel offline reinforcement learning method that leverages privileged sensors—available only during training—to learn a high-fidelity value function, which guides the learning of active perception policies. By integrating a small set of expert demonstrations with a coarse initial policy, AAWR achieves efficient policy optimization within an offline RL framework. Evaluated on three physical robots across eight manipulation tasks, AAWR consistently generates robust active perception behaviors, significantly outperforming prior methods under severe partial observability, with an average task success rate improvement of 27.4%. Its core contribution is the first integration of privileged information into offline RL via advantage-weighted regression to drive active perception—enabling efficient, deployment-ready policy learning without online exploration.

Technology Category

Application Category

📝 Abstract
A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to gain the missing information. Today's standard robot learning techniques struggle to produce such active perception behaviors. We propose a simple real-world robot learning recipe to efficiently train active perception policies. Our approach, asymmetric advantage weighted regression (AAWR), exploits access to "privileged" extra sensors at training time. The privileged sensors enable training high-quality privileged value functions that aid in estimating the advantage of the target policy. Bootstrapping from a small number of potentially suboptimal demonstrations and an easy-to-obtain coarse policy initialization, AAWR quickly acquires active perception behaviors and boosts task performance. In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. When initialized with a "generalist" robot policy that struggles with active perception tasks, AAWR efficiently generates information-gathering behaviors that allow it to operate under severe partial observability for manipulation tasks. Website: https://penn-pal-lab.github.io/aawr/
Problem

Research questions and friction points this paper is trying to address.

Develops a method to train robots for active perception under partial observability.
Uses privileged sensors to estimate advantages for policy improvement.
Enables robots to gather information effectively in manipulation tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Asymmetric advantage weighted regression for active perception
Uses privileged sensors to train value functions
Bootstraps from demonstrations and coarse policy initialization