๐ค AI Summary
Existing imaging methods struggle to jointly perform image reconstruction, semantic hypothesis modeling, and statistical significance quantification from a single observed image, thereby undermining the rigor of scientific hypothesis testing. To address this, we propose the first semantic hypothesis testing framework tailored for imaging inverse problems. Our approach innovatively integrates self-supervised computational imaging, vision-language models (VLMs), and e-value-based nonparametric hypothesis testing: VLMs translate natural-language descriptions into formal, testable semantic hypotheses; self-supervised reconstruction ensures high-fidelity image recovery; and e-value testing guarantees strict Type-I error control without requiring parametric distributional assumptions. Evaluated on image phenotyping tasks, our method substantially improves statistical power while maintaining well-calibrated false-positive ratesโachieving both statistical robustness and scientific interpretability.
๐ Abstract
This paper proposes a framework for semantic hypothesis testing tailored to imaging inverse problems. Modern imaging methods struggle to support hypothesis testing, a core component of the scientific method that is essential for the rigorous interpretation of experiments and robust interfacing with decision-making processes. There are three main reasons why image-based hypothesis testing is challenging. First, the difficulty of using a single observation to simultaneously reconstruct an image, formulate hypotheses, and quantify their statistical significance. Second, the hypotheses encountered in imaging are mostly of semantic nature, rather than quantitative statements about pixel values. Third, it is challenging to control test error probabilities because the null and alternative distributions are often unknown. Our proposed approach addresses these difficulties by leveraging concepts from self-supervised computational imaging, vision-language models, and non-parametric hypothesis testing with e-values. We demonstrate our proposed framework through numerical experiments related to image-based phenotyping, where we achieve excellent power while robustly controlling Type I errors.