Towards Efficient Verification of Parallel Applications with Mc SimGrid

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Non-systematic concurrency bugs in parallel programs suffer from low detection efficiency and poor root-cause attribution. To address this, we propose Optimized Dynamic Partial Order Reduction (ODPOR), a novel technique jointly targeting bug discovery and root-cause attribution. We introduce the RFS-ODPOR asynchronous exploration mechanism to prevent state-space stagnation, and design an efficient ODPOR-based backtracking algorithm that ensures high coverage while enabling precise bug localization. Integrated into the Mc SimGrid simulation framework, our approach supports validation on realistic-scale distributed applications. Experimental results demonstrate that, compared to conventional ODPOR, our method accelerates non-systematic bug detection by multiple orders of magnitude and generates compact, interpretable execution traces for root-cause溯源, significantly improving debuggability and verification comprehensibility.

Technology Category

Application Category

📝 Abstract

Assessing the correctness of distributed and parallel applications is notoriously difficult due to the complexity of the concurrent behaviors and the difficulty to reproduce bugs. In this context, Dynamic Partial Order Reduction (DPOR) techniques have proved successful in exploiting concurrency to verify applications without exploring all their behaviors. However, they may lack of efficiency when tracking non-systematic bugs of real size applications. In this paper, we suggest two adaptations of the Optimal Dynamic Partial Order Reduction (ODPOR) algorithm with a particular focus on bug finding and explanation. The first adaptation is an out-of-order version called RFS ODPOR which avoids being stuck in uninteresting large parts of the state space. Once a bug is found, the second adaptation takes advantage of ODPOR principles to efficiently find the origins of the bug.

Problem

Research questions and friction points this paper is trying to address.

Verifying correctness of parallel applications efficiently

Addressing inefficiency in tracking non-systematic bugs

Finding and explaining bug origins using ODPOR

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts ODPOR algorithm for bug finding

Introduces RFS ODPOR for state space efficiency

Uses ODPOR principles to trace bug origins

🔎 Similar Papers

No similar papers found.