Wherefore Art Thou? Provenance-Guided Automatic Online Debugging with Lumos

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This work addresses the challenge of root cause diagnosis in distributed systems, where non-deterministic bugs often exhibit runtime symptoms far removed from their true origins, and existing tools struggle to capture complete cross-component causal evidence with low overhead. To this end, the paper presents Lumos, an online debugging framework that integrates static program analysis with dependency-guided instrumentation to selectively and lightweightly record program state histories relevant to observed anomalies. Lumos enables precise correlation between runtime symptoms and their root causes by reconstructing full causal chains on demand. It is the first application-layer approach to support low-overhead, automated, and on-demand fault tracing, requiring only a few observed bug instances to efficiently and accurately localize root causes.

Technology Category

Application Category

📝 Abstract

Debugging distributed systems in-production is inevitable and hard. Myriad interactions between concurrent components in modern, complex and large-scale systems cause non-deterministic bugs that offline testing and verification fail to capture. When bugs surface at runtime, their root causes may be far removed from their symptoms. To identify a root cause, developers often need evidence scattered across multiple components and traces. Unfortunately, existing tools fail to quickly and automatically record useful provenance information at low overheads, leaving developers to manually perform the onerous evidence collection task. Lumos is an online debugging framework that exposes application-level bug provenances--the computational history linking symptoms of an incident to their root causes. Lumos leverages dependency-guided instrumentation powered by static analysis to identify program state related to a bug's provenance, and exposes them via lightweight on-demand recording. Lumos provides developers with enough evidence to identify a bug's root cause, while incurring low runtime overhead, and given only a few occurrences of a bug.

Problem

Research questions and friction points this paper is trying to address.

distributed systems

online debugging

provenance

non-deterministic bugs

root cause analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

provenance-guided debugging

online debugging

dependency-guided instrumentation