An End-to-End Framework for Building Large Language Models for Software Operations

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limitations of existing large language models (LLMs) in supporting efficient end-to-end intelligent operations due to low-quality data and fragmented domain knowledge. To overcome these challenges, the authors propose OpsLLM, a domain-specific LLM tailored for software operations, which introduces an innovative human-in-the-loop data construction pipeline and a Domain Process Reward Model (DPRM). The model is optimized through supervised fine-tuning combined with reinforcement learning. The project releases open-source models at multiple scales alongside a high-quality dataset, establishing a new paradigm for building end-to-end operational LLMs. Experimental results demonstrate that OpsLLM achieves accuracy improvements of 0.2%–5.7% on question-answering tasks and 2.7%–70.3% on root cause analysis tasks, significantly outperforming both open-source and closed-source baselines while exhibiting strong generalization capabilities.

📝 Abstract

In the field of software operations, Large Language Models (LLMs) have attracted increasing attention. However, existing research has not yet achieved efficient and effective end-to-end intelligent operations due to low-quality data, fragmented knowledge and insufficient learning. To explore the potential of LLMs in software operations, we propose OpsLLM, a domain-specific LLM that supports both knowledge-based question answering (QA) and root cause analysis (RCA). Moreover, we disclose the detailed workflow for building LLMs specifically in the software operations domain. First, a Human-in-the-Loop mechanism is introduced to curate highquality data from a large collection of operational raw data and construct a fine-tuning dataset. Then, based on the data, supervised fine-tuning is conducted to achieve a base model. Furthermore, we introduce a domain process reward model (DPRM) during the reinforcement learning stage to optimize the accuracy and reliability of the fine-tuned model on RCA tasks. Experimental results on the tasks with diverse difficulties demonstrate that OpsLLMs effectively learns and aligns with the operational domain knowledge infused, outperforming existing open-source and closed-source LLMs in accuracy with improvements of 0.2%~5.7% on QA tasks and 2.7% ~70.3% on RCA tasks, while exhibiting strong transferability. Moreover, we will open-source three versions of OpsLLM with 7B, 14B and 32B parameters, along with a 15K fine-tuning dataset.

Problem

Research questions and friction points this paper is trying to address.

software operations

Large Language Models

end-to-end intelligent operations

knowledge fragmentation

low-quality data

Innovation

Methods, ideas, or system contributions that make the work stand out.

OpsLLM

Human-in-the-Loop

Domain Process Reward Model