🤖 AI Summary
This work addresses the challenge of reliably translating high-level action plans from large language models (LLMs) into robust low-level control policies, a task further complicated by the absence of mechanisms to correct erroneous skill descriptions. To bridge this gap, the authors propose SCALAR, a framework that bidirectionally couples LLMs with deep reinforcement learning (RL). SCALAR leverages symbolic planning to generate skills annotated with preconditions and effects, which are then instantiated as executable policies via RL. It further incorporates Pivotal Trajectory Analysis to diagnose execution failures and refine the LLM’s initial skill specifications, alongside Frontier Checkpointing to preserve environment states at skill boundaries, thereby enhancing sample efficiency and robustness. Evaluated in the Craftax environment, SCALAR achieves an 88.2% success rate in diamond collection—1.9× higher than the best baseline—and reaches the previously inaccessible Gnomish Mines with a 9.1% success rate.
📝 Abstract
LM-based agents excel when given high-level action APIs but struggle to ground language into low-level control. Prior work has LLMs generate skills or reward functions for RL, but these one-shot approaches lack feedback to correct specification errors. We introduce SCALAR, a bidirectional framework coupling LLM planning with RL through a learned skill library. The LLM proposes skills with preconditions and effects; RL trains policies for each skill and feeds back execution results to iteratively refine specifications, improving robustness to initial errors. Pivotal Trajectory Analysis corrects LLM priors by analyzing RL trajectories; Frontier Checkpointing optionally saves environment states at skill boundaries to improve sample efficiency. On Craftax, SCALAR achieves 88.2% diamond collection, a 1.9x improvement over the best baseline, and reaches the Gnomish Mines 9.1% of the time where prior methods fail entirely.