🤖 AI Summary
This study addresses the computational bottleneck faced by existing decomposition-based system reliability methods when applied to large-scale coherent systems, where complexity escalates dramatically. To overcome this limitation, the authors propose the Reference-State Reliability (RSR) method, which abandons traditional hypercube-space decomposition in favor of classifying Monte Carlo samples via reference states and leveraging batched matrix operations for efficient uncertainty quantification. The RSR approach substantially reduces sensitivity to the number of reference states, natively accommodates multi-state systems, and exploits high-throughput matrix hardware for acceleration. Experimental results demonstrate that RSR can evaluate state probabilities for a graph-based system with 119 nodes and 295 edges within 10 seconds and scales efficiently to hundreds of thousands of reference states, significantly outperforming current state-of-the-art methods.
📝 Abstract
Coherent systems are representative of many practical applications, ranging from infrastructure networks to supply chains. Probabilistic evaluation of such systems remains challenging, however, because existing decomposition-based methods scale poorly as the number of components grows. To address this limitation, this study proposes the Reference-state System Reliability (RSR) method. Like existing approaches, RSR characterises the boundary between different system states using reference states in the component-state space. Where it departs from these methods is in how the state space is explored: rather than using reference states to decompose the space into disjoint hypercubes, RSR uses them to classify Monte Carlo samples, making computational cost significantly less sensitive to the number of reference states. To make this classification efficient, samples and reference states are stored as matrices and compared using batched matrix operations, allowing RSR to exploit the advances in high-throughput matrix computing driven by modern machine learning. We demonstrate that RSR evaluates the system-state probability of a graph with 119 nodes and 295 edges within 10~seconds, highlighting its potential for real-time risk assessment of large-scale systems. We further show that RSR scales to problems involving hundreds of thousands of reference states -- well beyond the reach of existing methods -- and extends naturally to multi-state systems. Nevertheless, when the number of boundary reference states grows exceedingly large, RSR's convergence slows down, a limitation shared with existing reference-state-based approaches that motivates future research into learning-based representations of system-state boundaries.