R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning

πŸ“… 2025-09-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address poor domain adaptation, overfitting, and hallucination induced by supervised fine-tuning (SFT) in log analysis, this paper proposes R-Logβ€”a novel framework integrating operational expertise guidance with reinforcement learning (RL). Methodologically, it (i) encodes human运维 strategies as reasoning trajectories to guide model cold-start initialization; (ii) constructs a simulated operational environment and designs a multi-task joint reward mechanism for RL-based optimization, mitigating context redundancy and answer drift; and (iii) incorporates strategy-driven SFT, trajectory-guided cold-start, environment-aware RL training, and lightweight R-Log-fast deployment. Evaluated on five real-world log tasks, R-Log achieves a 228.05% accuracy gain on unseen scenarios, attains 5Γ— inference speedup while retaining 93% of baseline performance, and significantly enhances generalization and practical deployability.

Technology Category

Application Category

πŸ“ Abstract
The growing complexity of log data in modern software systems has prompted the use of Large Language Models (LLMs) for automated log analysis. Current approaches typically rely on direct supervised fine-tuning (SFT) on log-label pairs. However, this exacerbates the domain discrepancy between general-purpose LLMs and specialized log data, causing overfitting. Furthermore, SFT's imbalanced loss computation often allows lengthy contexts to overwhelm critical, concise details in model answers, leading to hallucinations. To address these limitations, we propose R-Log, a novel reasoning-based paradigm that mirrors the structured, step-by-step analytical process of human engineers. This approach enhances generalizability by learning the underlying rules behind conclusions. We further employ Reinforcement Learning (RL) to optimize the model within a simulated O&M environment, thereby reducing hallucinations by directly rewarding correct outcomes. R-Log is first cold-started on a curated dataset of 2k+ reasoning trajectories, guided by 13 strategies from manual O&M practices, to establish an initial reasoning capability. This ability is then refined via RL using a joint reward function. Empirical evaluations on real-world logs show that R-Log outperforms existing methods across five log analysis tasks, particularly in unseen scenarios (by 228.05%). We also designed R-Log-fast with 5x speedup while keeping 93% of the efficacy.
Problem

Research questions and friction points this paper is trying to address.

Addresses domain gap between general LLMs and specialized log data
Reduces hallucinations by rewarding correct outcomes through reinforcement learning
Improves generalization in log analysis tasks for unseen scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning optimizes model in simulated environment
Reasoning-based paradigm mimics human analytical process
Cold-start training with curated reasoning trajectories dataset
πŸ”Ž Similar Papers
No similar papers found.
Y
Yilun Liu
Nankai University, China
Z
Ziang Chen
Nankai University, China
Song Xu
Song Xu
JD AI Research
natural language processingtext generationrecommender systems
M
Minggui He
Huawei, China
Shimin Tao
Shimin Tao
2012 Lab, Huawei co. LTD
Machine Translation AIOps Log Analysis
W
Weibin Meng
Huawei, China
Y
Yuming Xie
Huawei, China
T
Tao Han
Huawei, China
C
Chunguang Zhao
Huawei, China
J
Jingzhou Du
Huawei, China
D
Daimeng Wei
Huawei, China
Shenglin Zhang
Shenglin Zhang
Nankai University
AI Operations in general
Yongqian Sun
Yongqian Sun
Nankai University
AIOpsAnomaly DetectionFailure LocalizationMicroservices Fault DiagnosisRoot Cause Analysis