AURA: Agentic Upskilling via Reinforced Abstractions

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Converting high-level task prompts into agile robotic control policies faces combinatorial explosion and challenges in deploying multi-stage reinforcement learning (RL) pipelines. Method: We propose an LLM-driven curriculum RL framework featuring a novel schema-constrained YAML workflow auto-generation mechanism, integrated with a retrieval-augmented LLM agent for autonomous curriculum design, execution, and iterative optimization; static YAML schema validation ensures logical correctness prior to GPU training, eliminating manual intervention. Contribution/Results: This is the first work to enable end-to-end generation of deployable control policies directly from natural language prompts, achieving zero-shot cross-environment deployment on a custom humanoid robot. Experiments demonstrate substantial improvements in both training efficiency and policy generalization over existing LLM-guided baselines.

Technology Category

Application Category

📝 Abstract
We study the combinatorial explosion involved in translating high-level task prompts into deployable control policies for agile robots through multi-stage reinforcement learning. We introduce AURA (Agentic Upskilling via Reinforced Abstractions), a schema-centric curriculum RL framework that leverages Large Language Models (LLMs) as autonomous designers of multi-stage curricula. AURA transforms user prompts into YAML workflows that encode full reward functions, domain randomization strategies, and training configurations. All files are statically validated against a schema before any GPU time is consumed, ensuring reliable and efficient execution without human intervention. A retrieval-augmented feedback loop allows specialized LLM agents to design, execute, and refine staged curricula based on prior training results stored in a vector database, supporting continual improvement over time. Ablation studies highlight the importance of retrieval for curriculum quality and convergence stability. Quantitative experiments show that AURA consistently outperforms LLM-guided baselines on GPU-accelerated training frameworks. In qualitative tests, AURA successfully trains end-to-end policies directly from user prompts and deploys them zero-shot on a custom humanoid robot across a range of environments. By abstracting away the complexity of curriculum design, AURA enables scalable and adaptive policy learning pipelines that would be prohibitively complex to construct by hand.
Problem

Research questions and friction points this paper is trying to address.

Translating high-level task prompts into deployable robot control policies
Overcoming combinatorial explosion in multi-stage reinforcement learning
Automating curriculum design for scalable policy learning pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-designed multi-stage curriculum RL
Schema-validated YAML workflows automation
Retrieval-augmented feedback for continual improvement
🔎 Similar Papers
No similar papers found.
Alvin Zhu
Alvin Zhu
University of California Los Angeles
roboticsdeep learningreinforcement learning
Y
Yusuke Tanaka
University of California, Los Angeles
D
Dennis Hong
University of California, Los Angeles