Trace Sampling 2.0: Code Knowledge Enhanced Span-level Sampling for Distributed Tracing

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Distributed tracing data volume has surged, imposing prohibitive storage overhead; conventional trace-level sampling—e.g., retaining only anomalous traces—often discards critical diagnostic information such as normal execution paths. To address this, we propose Trace Sampling 2.0: the first approach to integrate code-knowledge-enhanced static analysis into distributed tracing, enabling span-level fine-grained sampling. We design an execution logic modeling mechanism and a structural consistency preservation scheme to compress traces while fully retaining call topology and key causal paths. Evaluated on two open-source microservice systems, our method achieves an 81.2% trace compression ratio, 98.1% recall for anomalous spans, and an average 8.3 percentage-point improvement in root-cause localization accuracy—demonstrating significant gains in both storage efficiency and diagnostic effectiveness.

Technology Category

Application Category

📝 Abstract

Distributed tracing is an essential diagnostic tool in microservice systems, but the sheer volume of traces places a significant burden on backend storage. A common approach to mitigating this issue is trace sampling, which selectively retains traces based on specific criteria, often preserving only anomalous ones. However, this method frequently discards valuable information, including normal traces that are essential for comparative analysis. To address this limitation, we introduce Trace Sampling 2.0, which operates at the span level while maintaining trace structure consistency. This approach allows for the retention of all traces while significantly reducing storage overhead. Based on this concept, we design and implement Autoscope, a span-level sampling method that leverages static analysis to extract execution logic, ensuring that critical spans are preserved without compromising structural integrity. We evaluated Autoscope on two open-source microservices. Our results show that it reduces trace size by 81.2% while maintaining 98.1% faulty span coverage, outperforming existing trace-level sampling methods. Furthermore, we demonstrate its effectiveness in root cause analysis, achieving an average improvement of 8.3%. These findings indicate that Autoscope can significantly enhance observability and storage efficiency in microservices, offering a robust solution for performance monitoring.

Problem

Research questions and friction points this paper is trying to address.

Reducing storage burden from high-volume distributed tracing

Preserving valuable trace information discarded by current sampling

Maintaining trace structural integrity while sampling at span level

Innovation

Methods, ideas, or system contributions that make the work stand out.

Span-level sampling with trace structure consistency

Static analysis extracts execution logic for sampling

Reduces storage while preserving faulty span coverage

🔎 Similar Papers

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis