On Data Engineering for Scaling LLM Terminal Capabilities

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic research and publicly available resources regarding training data strategies for terminal-based large language models. To this end, we propose Terminal-Task-Gen, a lightweight synthetic task generation framework that supports both seed- and skill-driven task construction. We present the first systematic analysis of how data filtering, curriculum learning, long-context training, and behavioral augmentation impact performance on terminal tasks. Leveraging this framework, we construct Terminal-Corpus, a large-scale open-source dataset used to train the Nemotron-Terminal model series (initialized from Qwen3). On Terminal-Bench 2.0, the 32B variant achieves a substantial improvement from 3.4% to 27.4% in performance, rivaling significantly larger models. All code, models, and datasets are publicly released.

Technology Category

Application Category

📝 Abstract
Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain largely undisclosed. We address this gap through a systematic study of data engineering practices for terminal agents, making two key contributions: (1) Terminal-Task-Gen, a lightweight synthetic task generation pipeline that supports seed-based and skill-based task construction, and (2) a comprehensive analysis of data and training strategies, including filtering, curriculum learning, long context training, and scaling behavior. Our pipeline yields Terminal-Corpus, a large-scale open-source dataset for terminal tasks. Using this dataset, we train Nemotron-Terminal, a family of models initialized from Qwen3(8B, 14B, 32B) that achieve substantial gains on Terminal-Bench 2.0: Nemotron-Terminal-8B improves from 2.5% to 13.0% Nemotron-Terminal-14B improves from 4.0% to 20.2%, and Nemotron-Terminal-32B improves from 3.4% to 27.4%, matching the performance of significantly larger models. To accelerate research in this domain, we open-source our model checkpoints and most of our synthetic datasets at https://huggingface.co/collections/nvidia/nemotron-terminal.
Problem

Research questions and friction points this paper is trying to address.

data engineering
large language models
terminal agents
training data strategies
synthetic task generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic task generation
data engineering
terminal agents
curriculum learning
large language models
🔎 Similar Papers
No similar papers found.