π€ AI Summary
Automated environment configuration remains a persistent challenge in software engineering, where existing large language models exhibit limited capability. This paper proposes a lightweight and efficient solution: an end-to-end framework for localized environment setup, built upon the Qwen3-8B model and integrating supervised fine-tuning with verifiable-reward reinforcement learning (RLVR). The approach significantly improves the correctness and task-specific adaptability of generated Bash scripts. Evaluated on the EnvBench-Python benchmark, it achieves performance comparable to Qwen3-32B and GPT-4oβmarking the first instance enabling real-time execution on consumer-grade hardware. We publicly release the training code and model checkpoints, establishing a new paradigm for reproducible large-scale experimentation and low-resource environment configuration.
π Abstract
Environment setup-the process of configuring the system to work with a specific software project-represents a persistent challenge in Software Engineering (SE). Automated environment setup methods could assist developers by providing fully configured environments for arbitrary repositories without manual effort. This also helps SE researchers to scale execution-based benchmarks. However, recent studies reveal that even state-of-the-art Large Language Models (LLMs) achieve limited success in automating this task. To address this limitation, we tune a specialized model for environment setup. We combine supervised fine-tuning for generating correct Bash scripts and Reinforcement Learning with Verifiable Rewards (RLVR) to adapt it to the task of environment setup. On EnvBench-Python, our method enables Qwen3-8B (a model runnable on consumer hardware) to perform on par with larger models-Qwen3-32B and GPT-4o. The training code and model checkpoints are available online: https://github.com/JetBrains-Research/PIPer.