BayesFLo: Bayesian fault localization of complex software systems

📅 2024-03-12

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing fault localization methods struggle to integrate domain and structural knowledge from test engineers and lack probabilistic risk assessment of potential root causes, resulting in excessively large candidate root-cause sets and high diagnostic costs in complex systems. This paper proposes a Bayesian inference-based root-cause identification framework that, for the first time, introduces combinatorial hierarchical and hereditary prior modeling to capture failure-inducing structures—enabling expert knowledge encoding and uncertainty quantification. By synergistically integrating graph representation learning, integer programming optimization, and Bayesian inference, the framework achieves probabilistic root-cause ranking. Evaluated on two industrial case studies—TCAS and JMP Easy DOE—the method significantly outperforms state-of-the-art approaches: it reduces the average candidate root-cause set size by over 60%, substantially lowering diagnostic effort and cost.

Technology Category

Application Category

📝 Abstract

Software testing is essential for the reliable development of complex software systems. A key step in software testing is fault localization, which uses test data to pinpoint failure-inducing combinations for further diagnosis. Existing fault localization methods have two key limitations: they (i) do not incorporate domain and/or structural knowledge from test engineers, and (ii) do not provide a probabilistic assessment of risk for potential root causes. Such methods can thus fail to confidently whittle down the combinatorial number of potential root causes in complex systems, resulting in prohibitively high testing costs. To address this, we propose a novel Bayesian fault localization framework called BayesFLo, which leverages a flexible Bayesian model for identifying potential root causes with probabilistic uncertainty. Using a carefully-specified prior on root cause probabilities, BayesFLo permits the integration of domain and structural knowledge via the principles of combination hierarchy and heredity, which capture the expected structure of failure-inducing combinations. We then develop new algorithms for efficient computation of posterior root cause probabilities, leveraging recent tools from integer programming and graph representations. Finally, we demonstrate the effectiveness of BayesFLo over existing methods in two fault localization case studies on the Traffic Alert and Collision Avoidance System and the JMP Easy DOE platform.

Problem

Research questions and friction points this paper is trying to address.

Identifies failure-inducing combinations in complex software systems.

Incorporates domain and structural knowledge for fault localization.

Provides probabilistic risk assessment for potential root causes.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian model for probabilistic fault localization

Integration of domain knowledge via combination hierarchy

Efficient algorithms using integer programming and graphs

🔎 Similar Papers

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis