🤖 AI Summary
To address the infeasibility of manual analysis for large-scale IT system logs, this paper proposes a lightweight log analysis framework leveraging large language models (LLMs). The method introduces a CPU-efficient inference mechanism that significantly improves LLM throughput on resource-constrained hardware without compromising semantic understanding fidelity. It integrates log parsing, contextual modeling, and fault-oriented semantic reasoning to enable end-to-end automated diagnosis. Deployed in production, the system supports 70 software products and has processed over 2,000 incident tickets. Empirical evaluation demonstrates an average monthly reduction of more than 300 human labor hours compared to conventional approaches—equivalent to approximately USD 15,444 in cost savings. The framework thus advances practical, scalable, and cost-effective LLM-based log analytics for real-world operational environments.
📝 Abstract
IT environments typically have logging mechanisms to monitor system health and detect issues. However, the huge volume of generated logs makes manual inspection impractical, highlighting the importance of automated log analysis in IT Software Support. In this paper, we propose a log analytics tool that leverages Large Language Models (LLMs) for log data processing and issue diagnosis, enabling the generation of automated insights and summaries. We further present a novel approach for efficiently running LLMs on CPUs to process massive log volumes in minimal time without compromising output quality. We share the insights and lessons learned from deployment of the tool - in production since March 2024 - scaled across 70 software products, processing over 2000 tickets for issue diagnosis, achieving a time savings of 300+ man hours and an estimated $15,444 per month in manpower costs compared to the traditional log analysis practices.