🤖 AI Summary
Distributed application performance diagnosis suffers from a severe semantic gap between high-level application logic and low-level observability tools. To bridge this gap, we propose “workflow motifs”—a novel abstraction that formally characterizes frequent, structured behavioral patterns in request execution, thereby aligning application semantics with system traces. We establish the first generalizable and mineable theoretical model of workflow motifs and rigorously formalize its correspondence with frequent subgraph mining (e.g., gSpan). Based on this foundation, we implement the first prototype system for recommending HDFS performance optimization opportunities. Evaluated on real-world HDFS deployments, our approach precisely identifies critical bottleneck paths—including cache misses and redundant RPCs—demonstrating the motif abstraction’s effectiveness, interpretability, and practical utility in performance diagnosis.
📝 Abstract
Diagnosing problems in deployed distributed applications continues to grow more challenging. A significant reason is the extreme mismatch between the powerful abstractions developers have available to build increasingly complex distributed applications versus the simple ones engineers have available to diagnose problems in them. To help, we present a novel abstraction, the workflow motif, instantiations of which represent characteristics of frequently-repeating patterns within and among request executions. We argue that workflow motifs will benefit many diagnosis tasks, formally define them, and use this definition to identify which frequent-subgraph-mining algorithms are good starting points for mining workflow motifs. We conclude by using an early version of workflow motifs to suggest performance-optimization points in HDFS.