🤖 AI Summary
To address the challenge of efficient, code-free Span correlation in microservices, this paper proposes an eBPF-based zero-intrusion end-to-end tracing solution. Methodologically: (1) eBPF is leveraged to capture cross-thread invocation contexts; (2) Span IDs are lightweightly embedded into TCP headers to ensure secure, low-overhead inter-service correlation; (3) a latency-pattern-driven causal inference algorithm—combined with a greedy strategy—enables dependency-free, thread-ID-agnostic automatic Span association. Our key contribution is the first integration of eBPF with protocol-layer identifier embedding and latency-driven causal inference, achieving high accuracy (>90%), low latency (thousands of Spans per second), and strong scalability—without compromising system security. Experimental evaluation demonstrates its effectiveness for production-grade microservice observability and root-cause diagnosis.
📝 Abstract
Distributed tracing has become an essential technique for debugging and troubleshooting modern microservice-based applications, enabling software engineers to detect performance bottlenecks, identify failures, and gain insights into system behavior. However, implementing distributed tracing in large-scale applications remains challenging due to the need for extensive instrumentation. To reduce this burden, zero-code instrumentation solutions, such as those based on eBPF, have emerged, allowing span data to be collected without modifying application code. Despite this promise, span correlation, the process of establishing causal relationships between spans, remains a critical challenge in zero-code approaches. Existing solutions often rely on thread affinity, compromise system security by requiring the kernel integrity mode to be disabled, or incur significant computational overhead due to complex inference algorithms. This paper presents CrossTrace, a practical and efficient distributed tracing solution designed to support the debugging of microservice applications without requiring source code modifications. CrossTrace employs a greedy algorithm to infer intra-service span relationships from delay patterns, eliminating reliance on thread identifiers. For inter-service correlation, CrossTrace embeds span identifiers into TCP packet headers via eBPF, enabling secure and efficient correlation compromising system security policies. Evaluation results show that CrossTrace can correlate thousands of spans within seconds with over 90% accuracy, making it suitable for production deployment and valuable for microservice observability and diagnosis.