Probabilistic Runtime Verification, Evaluation and Risk Assessment of Visual Deep Learning Systems

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks achieve strong performance on benchmark datasets but exhibit high sensitivity to minor distribution shifts commonly encountered in real-world deployment, leading to overestimated operational accuracy. To address this, we propose a probabilistic runtime verification and risk assessment framework that jointly models the probability of distribution shift occurrence and the model’s conditional correctness rate, enabling trustworthy accuracy estimation and quantitative risk evaluation. Our method introduces a novel binary-tree architecture that integrates out-of-distribution detection outputs with conditional correctness probabilities, supporting fine-grained accuracy prediction and cost-sensitive risk-aware decision-making. Experiments across five benchmark datasets demonstrate estimation errors as low as 0.01–0.1, substantially outperforming conventional approaches. Furthermore, we present the first cost-aware risk–value co-evaluation in medical image segmentation, enabling clinically meaningful trade-off analysis between diagnostic reliability and operational cost.

Technology Category

Application Category

📝 Abstract
Despite achieving excellent performance on benchmarks, deep neural networks often underperform in real-world deployment due to sensitivity to minor, often imperceptible shifts in input data, known as distributional shifts. These shifts are common in practical scenarios but are rarely accounted for during evaluation, leading to inflated performance metrics. To address this gap, we propose a novel methodology for the verification, evaluation, and risk assessment of deep learning systems. Our approach explicitly models the incidence of distributional shifts at runtime by estimating their probability from outputs of out-of-distribution detectors. We combine these estimates with conditional probabilities of network correctness, structuring them in a binary tree. By traversing this tree, we can compute credible and precise estimates of network accuracy. We assess our approach on five different datasets, with which we simulate deployment conditions characterized by differing frequencies of distributional shift. Our approach consistently outperforms conventional evaluation, with accuracy estimation errors typically ranging between 0.01 and 0.1. We further showcase the potential of our approach on a medical segmentation benchmark, wherein we apply our methods towards risk assessment by associating costs with tree nodes, informing cost-benefit analyses and value-judgments. Ultimately, our approach offers a robust framework for improving the reliability and trustworthiness of deep learning systems, particularly in safety-critical applications, by providing more accurate performance estimates and actionable risk assessments.
Problem

Research questions and friction points this paper is trying to address.

Addressing deep neural networks' sensitivity to distributional shifts in real-world deployment
Providing accurate runtime performance estimates under varying input data conditions
Enabling risk assessment for safety-critical applications of deep learning systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Models distributional shift probability from OOD detectors
Combines probabilities using binary tree structure
Computes credible accuracy estimates for risk assessment
🔎 Similar Papers
No similar papers found.