BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

๐Ÿ“… 2025-10-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Robotics has long been constrained by small-scale, low-diversity manipulation demonstrations collected manually, hindering generalizable learning. This paper introduces BLAZER, the first framework to employ large language models (LLMs) as zero-shot task planners that autonomously generate diverse, closed-loop feedback-driven manipulation trajectories in simulationโ€”and directly deploy them on real robots. Its core contributions are: (1) fully automated, high-quality demonstration generation without human annotation; (2) unsupervised self-improvement via iterative policy refinement guided by execution feedback; and (3) cross-task generalization capability with support for lightweight LLM deployment. Experiments demonstrate that BLAZER significantly improves zero-shot manipulation performance in both simulated and real-world settings, successfully executing unseen tasks and achieving end-to-end sim-to-real transfer.

Technology Category

Application Category

๐Ÿ“ Abstract
Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and language, robotics lacks access to internet-scale demonstrations across diverse robotic tasks and environments. As a result, the scale of existing datasets typically suffers from the need for manual data collection and curation. To address this problem, here we propose BLAZER, a framework that learns manipulation policies from automatically generated training data. We build on the zero-shot capabilities of LLM planners and automatically generate demonstrations for diverse manipulation tasks in simulation. Successful examples are then used to finetune an LLM and to improve its planning capabilities without human supervision. Notably, while BLAZER training requires access to the simulator's state, we demonstrate direct transfer of acquired skills to sensor-based manipulation. Through extensive experiments, we show BLAZER to significantly improve zero-shot manipulation in both simulated and real environments. Moreover, BLAZER improves on tasks outside of its training pool and enables downscaling of LLM models. Our code and data will be made publicly available on the project page.
Problem

Research questions and friction points this paper is trying to address.

Automatically generating training data for robot manipulation policies
Addressing lack of internet-scale robotic demonstrations
Improving zero-shot manipulation in simulated and real environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically generates training data using LLM planners
Fine-tunes LLM with successful demonstrations for planning
Transfers simulator-trained skills to sensor-based manipulation
๐Ÿ”Ž Similar Papers
No similar papers found.