Robust Instant Policy: Leveraging Student's t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation

📅 2025-06-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In-context imitation learning (In-Context IL) suffers from hallucination-induced failures when large language models (LLMs) serve as zero-shot policies, producing erroneous trajectories due to spurious outputs. To address this, we propose a robust trajectory aggregation method based on Student’s t-regression, which performs outlier-resistant weighted fusion of multiple LLM-generated candidate trajectories. Leveraging the heavy-tailed property of the t-distribution, our approach automatically downweights hallucinated outliers without fine-tuning the LLM and requires only a few demonstrations. This is the first integration of t-regression into the In-Context IL framework, enabling low-data-dependency, plug-and-play trajectory synthesis. Evaluated on both simulated and real-world robotic manipulation tasks, our method achieves an average success rate improvement of ≥26%, significantly outperforming existing approaches—especially in low-shot, everyday scenarios.

Technology Category

Application Category

📝 Abstract
Imitation learning (IL) aims to enable robots to perform tasks autonomously by observing a few human demonstrations. Recently, a variant of IL, called In-Context IL, utilized off-the-shelf large language models (LLMs) as instant policies that understand the context from a few given demonstrations to perform a new task, rather than explicitly updating network models with large-scale demonstrations. However, its reliability in the robotics domain is undermined by hallucination issues such as LLM-based instant policy, which occasionally generates poor trajectories that deviate from the given demonstrations. To alleviate this problem, we propose a new robust in-context imitation learning algorithm called the robust instant policy (RIP), which utilizes a Student's t-regression model to be robust against the hallucinated trajectories of instant policies to allow reliable trajectory generation. Specifically, RIP generates several candidate robot trajectories to complete a given task from an LLM and aggregates them using the Student's t-distribution, which is beneficial for ignoring outliers (i.e., hallucinations); thereby, a robust trajectory against hallucinations is generated. Our experiments, conducted in both simulated and real-world environments, show that RIP significantly outperforms state-of-the-art IL methods, with at least $26%$ improvement in task success rates, particularly in low-data scenarios for everyday tasks. Video results available at https://sites.google.com/view/robustinstantpolicy.
Problem

Research questions and friction points this paper is trying to address.

Addresses unreliable LLM-generated robot trajectories in imitation learning
Reduces hallucination issues in instant policies for robotics
Improves robustness in low-data imitation learning scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Student's t-regression for robustness
Aggregates trajectories with t-distribution
Improves success rates by 26%
🔎 Similar Papers
No similar papers found.
Hanbit Oh
Hanbit Oh
National Institute of Advanced Industrial Science and Technology (AIST)
Robot learningImitation learningLearning from demonstration
A
Andrea M. Salcedo-V'azquez
Industrial Cyber-Physical Systems Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Japan
I
I. Ramirez-Alpizar
Industrial Cyber-Physical Systems Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Japan
Yukiyasu Domae
Yukiyasu Domae
AIST
Machine visionManipulationAutomationExperiential autonomy