PRISM: Personalized Refinement of Imitation Skills for Manipulation via Human Instructions

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the challenge of efficiently adapting general-purpose imitation policies to novel task objectives and constraints while maintaining data efficiency and deployment robustness. The authors propose an instruction-conditioned policy optimization framework that integrates imitation learning with reinforcement learning, leveraging natural language task descriptions to automatically generate reward functions. For the first time, this approach combines human feedback on intermediate trajectories with a Eureka-style reward generation mechanism to enable personalized policy refinement. Evaluated on simulated pick-and-place tasks, the method significantly outperforms feedback-free baselines, achieving enhanced robustness with reduced computational overhead and enabling efficient reuse of general policies across diverse task configurations.

Technology Category

Application Category

📝 Abstract

This paper presents PRISM: an instruction-conditioned refinement method for imitation policies in robotic manipulation. This approach bridges Imitation Learning (IL) and Reinforcement Learning (RL) frameworks into a seamless pipeline, such that an imitation policy on a broad generic task, generated from a set of user-guided demonstrations, can be refined through reinforcement to generate new unseen fine-grain behaviours. The refinement process follows the Eureka paradigm, where reward functions for RL are iteratively generated from an initial natural-language task description. Presented approach, builds on top of this mechanism to adapt a refined IL policy of a generic task to new goal configurations and the introduction of constraints by adding also human feedback correction on intermediate rollouts, enabling policy reusability and therefore data efficiency. Results for a pick-and-place task in a simulated scenario show that proposed method outperforms policies without human feedback, improving robustness on deployment and reducing computational burden.

Problem

Research questions and friction points this paper is trying to address.

Imitation Learning

Reinforcement Learning

Human Instructions

Policy Refinement

Robotic Manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

instruction-conditioned refinement

imitation learning

reinforcement learning

human feedback

policy reusability

🔎 Similar Papers

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

2024-05-22arXiv.orgCitations: 1