Update-Free On-Policy Steering via Verifiers

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Behavior cloning policies often exhibit insufficient robustness and low task success rates in robotic fine manipulation. This work proposes UF-OPS, a novel approach that, for the first time, enables online guidance of action selection from a black-box diffusion policy without updating its parameters. By training a verifier function on rolled-back execution data, UF-OPS scores candidate actions during inference and dynamically filters those with high reliability. The method is lightweight and computationally efficient, achieving an average 49% improvement in task success rate across five real-world robotic manipulation tasks. Extensive experiments demonstrate its effectiveness and generalization capability in both simulation and physical environments.

Technology Category

Application Category

📝 Abstract

In recent years, Behavior Cloning (BC) has become one of the most prevalent methods for enabling robots to mimic human demonstrations. However, despite their successes, BC policies are often brittle and struggle with precise manipulation. To overcome these issues, we propose UF-OPS, an Update-Free On-Policy Steering method that enables the robot to predict the success likelihood of its actions and adapt its strategy at execution time. We accomplish this by training verifier functions using policy rollout data obtained during an initial evaluation of the policy. These verifiers are subsequently used to steer the base policy toward actions with a higher likelihood of success. Our method improves the performance of black-box diffusion policy, without changing the base parameters, making it light-weight and flexible. We present results from both simulation and real-world data and achieve an average 49% improvement in success rate over the base policy across 5 real tasks.

Problem

Research questions and friction points this paper is trying to address.

Behavior Cloning

precise manipulation

policy brittleness

robot imitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Update-Free Steering

On-Policy Adaptation

Verifier Functions

Behavior Cloning