Trace Sampling 2.0: Code Knowledge Enhanced Span-level Sampling for Distributed Tracing

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Distributed tracing data volume has surged, imposing prohibitive storage overhead; conventional trace-level sampling—e.g., retaining only anomalous traces—often discards critical diagnostic information such as normal execution paths. To address this, we propose Trace Sampling 2.0: the first approach to integrate code-knowledge-enhanced static analysis into distributed tracing, enabling span-level fine-grained sampling. We design an execution logic modeling mechanism and a structural consistency preservation scheme to compress traces while fully retaining call topology and key causal paths. Evaluated on two open-source microservice systems, our method achieves an 81.2% trace compression ratio, 98.1% recall for anomalous spans, and an average 8.3 percentage-point improvement in root-cause localization accuracy—demonstrating significant gains in both storage efficiency and diagnostic effectiveness.

Technology Category

Application Category

📝 Abstract
Distributed tracing is an essential diagnostic tool in microservice systems, but the sheer volume of traces places a significant burden on backend storage. A common approach to mitigating this issue is trace sampling, which selectively retains traces based on specific criteria, often preserving only anomalous ones. However, this method frequently discards valuable information, including normal traces that are essential for comparative analysis. To address this limitation, we introduce Trace Sampling 2.0, which operates at the span level while maintaining trace structure consistency. This approach allows for the retention of all traces while significantly reducing storage overhead. Based on this concept, we design and implement Autoscope, a span-level sampling method that leverages static analysis to extract execution logic, ensuring that critical spans are preserved without compromising structural integrity. We evaluated Autoscope on two open-source microservices. Our results show that it reduces trace size by 81.2% while maintaining 98.1% faulty span coverage, outperforming existing trace-level sampling methods. Furthermore, we demonstrate its effectiveness in root cause analysis, achieving an average improvement of 8.3%. These findings indicate that Autoscope can significantly enhance observability and storage efficiency in microservices, offering a robust solution for performance monitoring.
Problem

Research questions and friction points this paper is trying to address.

Reducing storage burden from high-volume distributed tracing
Preserving valuable trace information discarded by current sampling
Maintaining trace structural integrity while sampling at span level
Innovation

Methods, ideas, or system contributions that make the work stand out.

Span-level sampling with trace structure consistency
Static analysis extracts execution logic for sampling
Reduces storage while preserving faulty span coverage
🔎 Similar Papers
No similar papers found.
Y
Yulun Wu
The Chinese University of Hong Kong, Hong Kong SAR, China
Guangba Yu
Guangba Yu
Postdoc, The Chinese University of Hong Kong
Cloud ComputingLLMOpsAIOpsDistributed SystemsChaos engineering
Z
Zhihan Jiang
The Chinese University of Hong Kong, Hong Kong SAR, China
Y
Yichen Li
The Chinese University of Hong Kong, Hong Kong SAR, China
Michael R. Lyu
Michael R. Lyu
Professor of Computer Science & Engineering, The Chinese University of Hong Kong
software engineeringsoftware reliabilityfault tolerancemachine learningdistributed systems