🤖 AI Summary
Deep learning-based image reconstruction generates realistic medical images from sparse data but suffers from a “black-box” nature, yielding outputs lacking statistical reliability and thus hindering clinical decision-making. To address this, we propose the first conformal prediction framework tailored to clinically meaningful semantic metrics—such as fat mass—jointly calibrating pixel-level reconstructions with metric-space uncertainty quantification to deliver verifiable, semantically grounded confidence statements. Methodologically, we design a modular interface for black-box reconstruction algorithms, construct a conformal predictor grounded in clinical metrics, and integrate automatic detection of anomalous reconstructions alongside nearest-neighbor retrospective visualization. Evaluated on sparse-view CT reconstruction, our framework substantially enhances the clinical interpretability of confidence intervals, provides rigorous coverage guarantees for fat quantification and radiotherapy planning, and successfully identifies high-risk samples exhibiting visually plausible reconstructions yet statistically aberrant metric values.
📝 Abstract
Modern deep learning reconstruction algorithms generate impressively realistic scans from sparse inputs, but can often produce significant inaccuracies. This makes it difficult to provide statistically guaranteed claims about the true state of a subject from scans reconstructed by these algorithms. In this study, we propose a framework for computing provably valid prediction bounds on claims derived from probabilistic black-box image reconstruction algorithms. The key insights behind our framework are to represent reconstructed scans with a derived clinical metric of interest, and to calibrate bounds on the ground truth metric with conformal prediction (CP) using a prior calibration dataset. These bounds convey interpretable feedback about the subject's state, and can also be used to retrieve nearest-neighbor reconstructed scans for visual inspection. We demonstrate the utility of this framework on sparse-view computed tomography (CT) for fat mass quantification and radiotherapy planning tasks. Results show that our framework produces bounds with better semantical interpretation than conventional pixel-based bounding approaches. Furthermore, we can flag dangerous outlier reconstructions that look plausible but have statistically unlikely metric values.