🤖 AI Summary
Distributed tracing data volume has surged, imposing prohibitive storage overhead; conventional trace-level sampling—e.g., retaining only anomalous traces—often discards critical diagnostic information such as normal execution paths. To address this, we propose Trace Sampling 2.0: the first approach to integrate code-knowledge-enhanced static analysis into distributed tracing, enabling span-level fine-grained sampling. We design an execution logic modeling mechanism and a structural consistency preservation scheme to compress traces while fully retaining call topology and key causal paths. Evaluated on two open-source microservice systems, our method achieves an 81.2% trace compression ratio, 98.1% recall for anomalous spans, and an average 8.3 percentage-point improvement in root-cause localization accuracy—demonstrating significant gains in both storage efficiency and diagnostic effectiveness.
📝 Abstract
Distributed tracing is an essential diagnostic tool in microservice systems, but the sheer volume of traces places a significant burden on backend storage. A common approach to mitigating this issue is trace sampling, which selectively retains traces based on specific criteria, often preserving only anomalous ones. However, this method frequently discards valuable information, including normal traces that are essential for comparative analysis. To address this limitation, we introduce Trace Sampling 2.0, which operates at the span level while maintaining trace structure consistency. This approach allows for the retention of all traces while significantly reducing storage overhead. Based on this concept, we design and implement Autoscope, a span-level sampling method that leverages static analysis to extract execution logic, ensuring that critical spans are preserved without compromising structural integrity. We evaluated Autoscope on two open-source microservices. Our results show that it reduces trace size by 81.2% while maintaining 98.1% faulty span coverage, outperforming existing trace-level sampling methods. Furthermore, we demonstrate its effectiveness in root cause analysis, achieving an average improvement of 8.3%. These findings indicate that Autoscope can significantly enhance observability and storage efficiency in microservices, offering a robust solution for performance monitoring.