🤖 AI Summary
This work addresses the challenge that multitask robotic policies often struggle to respond to new instructions during execution due to poor steerability. To this end, we propose the ReSteer framework, which introduces—for the first time—a quantitative steerability metric based on trajectory distribution overlap and devises a method to identify low-steerability states without requiring full policy rollouts. ReSteer employs a closed-loop self-optimization pipeline comprising a steerability estimator, a steerability-aware data generator, and a policy self-refinement training procedure, substantially enhancing real-time responsiveness to user commands. Evaluated in the LIBERO simulation environment, our approach achieves an 11% improvement in steerability with only 18k rollouts. Real-robot experiments further demonstrate that the system can effectively incorporate new user instructions at arbitrary execution times.
📝 Abstract
Despite strong multi-task pretraining, existing policies often exhibit poor task steerability. For example, a robot may fail to respond to a new instruction ``put the bowl in the sink" when moving towards the oven, executing ``close the oven", even though it can complete both tasks when executed separately. We propose ReSteer, a framework to quantify and improve task steerability in multitask robot policies. We conduct an exhaustive evaluation of state-of-the-art policies, revealing a common lack of steerability. We find that steerability is associated with limited overlap among training task trajectory distributions, and introduce a proxy metric to measure this overlap from policy behavior. Building on this insight, ReSteer improves steerability via three components: (i) a steerability estimator that identifies low-steerability states without full-rollout evaluation, (ii) a steerable data generator that synthesizes motion segments from these states, and (iii) a self-refinement pipeline that improves policy steerability using the generated data. In simulation on LIBERO, ReSteer improves steerability by 11\% over 18k rollouts. In real-world experiments, we show that improved steerability is critical for interactive use, enabling users to instruct robots to perform any task at any time. We hope this work motivates further study on quantifying steerability and data collection strategies for large robot policies.