🤖 AI Summary
Kubernetes management suffers from API complexity, fragmented tooling, and high configuration barriers. To address these challenges, we propose the first LLM-driven intelligent control framework supporting the full Kubernetes lifecycle. Our approach employs a modular agent architecture organized by functional domains, integrating secure code generation, workflow-aware memory, human-in-the-loop clarification, and dynamic tool composition—enabling interpretable, extensible, end-to-end natural language operations (read, write, execute, and RBAC enforcement). Unlike prior works, ours is the first to enable fully controllable and auditable Kubernetes governance across the entire operational spectrum. Evaluation on 200 real-world natural language queries achieves 100% operational reliability and 93% tool generation success rate. The framework is open-sourced and validated via both Azure-hosted interactive demos and local kind-based deployments.
📝 Abstract
Kubernetes has become the foundation of modern cloud-native infrastructure, yet its management remains complex and fragmented. Administrators must navigate a vast API surface, manage heterogeneous workloads, and coordinate tasks across disconnected tools - often requiring precise commands, YAML configuration, and contextual expertise.
This paper presents KubeIntellect, a Large Language Model (LLM)-powered system for intelligent, end-to-end Kubernetes control. Unlike existing tools that focus on observability or static automation, KubeIntellect supports natural language interaction across the full spectrum of Kubernetes API operations, including read, write, delete, exec, access control, lifecycle, and advanced verbs. The system uses modular agents aligned with functional domains (e.g., logs, metrics, RBAC), orchestrated by a supervisor that interprets user queries, maintains workflow memory, invokes reusable tools, or synthesizes new ones via a secure Code Generator Agent.
KubeIntellect integrates memory checkpoints, human-in-the-loop clarification, and dynamic task sequencing into a structured orchestration framework. Evaluation results show a 93% tool synthesis success rate and 100% reliability across 200 natural language queries, demonstrating the system's ability to operate efficiently under diverse workloads. An automated demo environment is provided on Azure, with additional support for local testing via kind. This work introduces a new class of interpretable, extensible, and LLM-driven systems for managing complex infrastructure.