🤖 AI Summary
Existing log anomaly detection methods face dual challenges: conventional deep learning models suffer from limited interpretability and generalization, while large language models (LLMs) exhibit hallucination, factual inaccuracies, and low reliability. To address these issues, this paper proposes the first unified framework integrating chain-of-thought-guided supervised fine-tuning (CoT-SFT) with multi-dimensional reward reinforcement learning (RL). We construct a high-quality training dataset via expert calibration and design a dual-objective reward mechanism that jointly optimizes accuracy and logical consistency—thereby significantly enhancing reasoning coherence and suppressing hallucination. Evaluated on mainstream benchmarks, our method achieves state-of-the-art F1 scores and generates verifiable, step-by-step reasoning traces. The source code and dataset are publicly released.
📝 Abstract
Logs constitute a form of evidence signaling the operational status of software systems. Automated log anomaly detection is crucial for ensuring the reliability of modern software systems. However, existing approaches face significant limitations: traditional deep learning models lack interpretability and generalization, while methods leveraging Large Language Models are often hindered by unreliability and factual inaccuracies. To address these issues, we propose RationAnomaly, a novel framework that enhances log anomaly detection by synergizing Chain-of-Thought (CoT) fine-tuning with reinforcement learning. Our approach first instills expert-like reasoning patterns using CoT-guided supervised fine-tuning, grounded in a high-quality dataset corrected through a rigorous expert-driven process. Subsequently, a reinforcement learning phase with a multi-faceted reward function optimizes for accuracy and logical consistency, effectively mitigating hallucinations. Experimentally, RationAnomaly outperforms state-of-the-art baselines, achieving superior F1-scores on key benchmarks while providing transparent, step-by-step analytical outputs. We have released the corresponding resources, including code and datasets.