🤖 AI Summary
Existing microservice resource scheduling approaches struggle to adapt to the dynamic evolution of runtime call graphs, often resulting in either resource wastage or violations of service-level objectives (SLOs). This work reveals, for the first time, the high concentration of invocation paths in large-scale production environments and introduces a joint optimization framework based on structural fingerprint decomposition and pattern-aware modeling. By identifying stable backbones and deviating subgraphs within call graphs through structural fingerprints, and integrating invocation pattern clustering to predict workload distributions, the proposed method constructs a global resource allocation model that satisfies end-to-end tail-latency SLOs. Evaluated on the TrainTicket benchmark, the approach reduces CPU consumption by 35–38% compared to state-of-the-art baselines while maintaining a 98.8% SLO compliance rate.
📝 Abstract
Modern microservice systems exhibit continuous structural evolution in their runtime call graphs due to workload fluctuations, fault responses, and deployment activities. Despite this complexity, our analysis of over 500,000 production traces from ByteDance reveals a latent regularity: execution paths concentrate around a small set of recurring invocation patterns. However, existing resource management approaches fail to exploit this structure. Industrial autoscalers like Kubernetes HPA ignore inter-service dependencies, while recent academic methods often assume static topologies, rendering them ineffective under dynamic execution contexts. In this work, we propose Morphis, a dependency-aware provisioning framework that unifies pattern-aware trace analysis with global optimization. It introduces structural fingerprinting that decomposes traces into a stable execution backbone and interpretable deviation subgraphs. Then, resource allocation is formulated as a constrained optimization problem over predicted pattern distributions, jointly minimizing aggregate CPU usage while satisfying end-to-end tail-latency SLOs. Our extensive evaluations on the TrainTicket benchmark demonstrate that Morphis reduces CPU consumption by 35-38% compared to state-of-the-art baselines while maintaining 98.8% SLO compliance.