PIPer: On-Device Environment Setup via Online Reinforcement Learning

πŸ“… 2025-09-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Automated environment configuration remains a persistent challenge in software engineering, where existing large language models exhibit limited capability. This paper proposes a lightweight and efficient solution: an end-to-end framework for localized environment setup, built upon the Qwen3-8B model and integrating supervised fine-tuning with verifiable-reward reinforcement learning (RLVR). The approach significantly improves the correctness and task-specific adaptability of generated Bash scripts. Evaluated on the EnvBench-Python benchmark, it achieves performance comparable to Qwen3-32B and GPT-4oβ€”marking the first instance enabling real-time execution on consumer-grade hardware. We publicly release the training code and model checkpoints, establishing a new paradigm for reproducible large-scale experimentation and low-resource environment configuration.

Technology Category

Application Category

πŸ“ Abstract
Environment setup-the process of configuring the system to work with a specific software project-represents a persistent challenge in Software Engineering (SE). Automated environment setup methods could assist developers by providing fully configured environments for arbitrary repositories without manual effort. This also helps SE researchers to scale execution-based benchmarks. However, recent studies reveal that even state-of-the-art Large Language Models (LLMs) achieve limited success in automating this task. To address this limitation, we tune a specialized model for environment setup. We combine supervised fine-tuning for generating correct Bash scripts and Reinforcement Learning with Verifiable Rewards (RLVR) to adapt it to the task of environment setup. On EnvBench-Python, our method enables Qwen3-8B (a model runnable on consumer hardware) to perform on par with larger models-Qwen3-32B and GPT-4o. The training code and model checkpoints are available online: https://github.com/JetBrains-Research/PIPer.
Problem

Research questions and friction points this paper is trying to address.

Automating software project environment setup process
Addressing limitations of large language models in configuration
Enabling on-device performance matching larger models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines supervised fine-tuning with reinforcement learning
Uses verifiable rewards to adapt Bash script generation
Enables small models to match larger models performance
πŸ”Ž Similar Papers