CLI-Gym: Scalable CLI Task Generation via Agentic Environment Inversion

πŸ“… 2026-02-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the scarcity of scalable, environment-intensive command-line interface (CLI) tasks, which hinders effective training of agents in interactive runtime environments. The authors propose the first end-to-end framework for automatically generating CLI repair tasks: by leveraging agent exploration histories and execution feedback, the system retroactively constructs erroneous initial states from known healthy states. Integrating Dockerfile analogies, environment state tracking, and trajectory collection techniques, the framework autonomously synthesizes realistic repair tasks. It produces 1,655 such tasksβ€”the largest dataset of its kind to date. Fine-tuning the LiberCoder model on this dataset yields a 46.1% accuracy on Terminal-Bench, representing an absolute improvement of 21.1% and significantly advancing the capability to train environment-interactive agents.

Technology Category

Application Category

πŸ“ Abstract
Agentic coding requires agents to effectively interact with runtime environments, e.g., command line interfaces (CLI), so as to complete tasks like resolving dependency issues, fixing system problems, etc. But it remains underexplored how such environment-intensive tasks can be obtained at scale to enhance agents'capabilities. To address this, based on an analogy between the Dockerfile and the agentic task, we propose to employ agents to simulate and explore environment histories, guided by execution feedback. By tracing histories of a healthy environment, its state can be inverted to an earlier one with runtime failures, from which a task can be derived by packing the buggy state and the corresponding error messages. With our method, named CLI-Gym, a total of 1,655 environment-intensive tasks are derived, being the largest collection of its kind. Moreover, with curated successful trajectories, our fine-tuned model, named LiberCoder, achieves substantial absolute improvements of +21.1% (to 46.1%) on Terminal-Bench, outperforming various strong baselines. To our knowledge, this is the first public pipeline for scalable derivation of environment-intensive tasks.
Problem

Research questions and friction points this paper is trying to address.

CLI tasks
environment-intensive tasks
task generation
agentic coding
runtime environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

CLI-Gym
agentic environment inversion
scalable task generation
environment-intensive tasks
LiberCoder
πŸ”Ž Similar Papers
No similar papers found.
Y
Yusong Lin
Huawei Technologies Co., Ltd
H
Haiyang Wang
Huawei Technologies Co., Ltd
Shuzhe Wu
Shuzhe Wu
Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionMachine Learning
L
Lue Fan
Institute of Automation, Chinese Academy of Sciences
Feiyang Pan
Feiyang Pan
Institute of Computing Technology, Chinese Academy of Sciences
Reinforcement Learning
S
Sanyuan Zhao
Beijing Institute of Technology
D
Dandan Tu
Huawei Technologies Co., Ltd