🤖 AI Summary
This work addresses the limitations of existing microservice autoscaling approaches, which either rely on heavily trained black-box models or handcrafted rules that lack generalizability in dynamic environments. For the first time, it integrates large language models (LLMs) with chain-of-thought reasoning to diagnose performance bottlenecks and recommend resource allocations by translating runtime telemetry into natural language state descriptions. The proposed method generates interpretable reasoning traces without requiring task-specific fine-tuning, enabling few-shot, generalizable, and safe scheduling decisions. Experimental evaluation on open-source microservice workloads demonstrates a 15% improvement in root cause identification accuracy, a 24× reduction in training overhead, and a 6% enhancement in short-term service quality.
📝 Abstract
Applications are moving away from monolithic designs to microservice and serverless architectures, where fleets of lightweight and independently deployable components run on public clouds. Autoscaling serves as the primary control mechanism for balancing resource utilization and quality of service, yet existing policies are either opaque learned models that require substantial per-deployment training or brittle hand-tuned rules that fail to generalize. We investigate whether large language models can act as universal few-shot resource allocators that adapt across rapidly evolving microservice deployments. We propose ORACL, Optimized Reasoning for Autoscaling via Chain of Thought with LLMs for Microservices, a framework that leverages prior knowledge and chain-of-thought reasoning to diagnose performance regressions and recommend resource allocations. ORACL transforms runtime telemetry, including pods, replicas, CPU and memory usage, latency, service-level objectives, and fault signals, into semantic natural-language state descriptions and invokes an LLM to produce an interpretable intermediate reasoning trace. This reasoning identifies likely root causes, prunes the action space, and issues safe allocation decisions under policy constraints. Experiments on representative open-source microservice workloads show that ORACL improves root-cause identification accuracy by 15 percent, accelerates training by up to 24x, and improves quality of service by 6 percent in short-term scenarios, without deployment-specific retraining.