SiriusHelper: An LLM Agent-Based Operations Assistant for Big Data Platforms

πŸ“… 2026-04-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

155K/year
πŸ€– AI Summary
This work proposes a unified LLM-agent-based operations and maintenance (O&M) assistant framework to address the limitations of large language model assistants in big data platform management, including insufficient scenario coverage, inefficient knowledge retrieval, and high maintenance costs. The framework employs intent recognition to dynamically route queries to specialized processing pipelines, integrates a priority-tiered knowledge base with a DeepSearch-powered multi-hop retrieval mechanism, and introduces an automated failure diagnosis and domain-specific standard operating procedure (SOP) distillation pipeline to enable closed-loop knowledge updating. By synergistically combining retrieval-augmented generation (RAG), ticket parsing, and structured SOP generation, the system has been deployed on Tencent’s big data platform, where it significantly outperforms existing solutions, reducing online support tickets by 20.8%.
πŸ“ Abstract
Big data platforms are widely used in modern enterprises, and an in-production intelligent assistant is increasingly important to help users quickly find actionable guidance and reduce operational burden. While recent LLM+RAG assistants provide a natural interface, they face practical challenges in real deployments: limited scenario coverage across both general consultation and domain-specific troubleshooting workflows, inefficient knowledge access due to inadequate multi-hop retrieval and flat knowledge organization, and high maintenance cost because escalated tickets are unstructured and hard to convert into assistant improvements and reusable SOPs. In this paper, we present SiriusHelper, a deployed intelligent assistant for big data platforms. SiriusHelper serves as a unified online assistant that automatically identifies user intent and routes queries to the right handling path, including dedicated expert workflows for specialized scenarios (e.g., SQL execution diagnosis). To support complex troubleshooting, SiriusHelper combines a DeepSearch-driven mechanism with a priority-based hierarchical knowledge base to enable multi-hop retrieval without context overload, thus improving answer reliability and latency. To reduce expert overhead, SiriusHelper further introduces automated ticket understanding and SOP distillation: it diagnoses the assistant failure reason (e.g., missing knowledge or wrong routing) and extracts domain-specific SOPs to continuously enrich the knowledge base. Experiments and online deployment on Tencent Big Data platform show that SiriusHelper outperforms representative alternatives and reduces online ticket volume by 20.8\%.
Problem

Research questions and friction points this paper is trying to address.

LLM agent
big data platforms
multi-hop retrieval
knowledge organization
ticket understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agent
multi-hop retrieval
hierarchical knowledge base
SOP distillation
intent routing
πŸ”Ž Similar Papers