Towards a Certificate of Trust: Task-Aware OOD Detection for Scientific AI

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In scientific AI, data-driven regression models often fail under out-of-distribution (OOD) inputs, and existing OOD detection methods for regression lack task relevance and verifiability. To address this, we propose a task-aware OOD detection framework—the first to adapt score-based diffusion models to regression settings. Our method models the joint likelihood of input and prediction, yielding interpretable “trust certificates” whose scores exhibit strong correlation with actual prediction errors. By integrating task-aware conditioning with score-based generative modeling, we achieve reliable, error-aligned uncertainty quantification. Extensive evaluation across diverse scientific domains—including PDE solving, satellite remote sensing, and medical image segmentation—demonstrates that our joint likelihood estimation significantly outperforms state-of-the-art baselines. The implementation is open-sourced, providing the first regression-specific, error-correlated, and verifiable confidence assessment tool for scientific AI.

Technology Category

Application Category

📝 Abstract
Data-driven models are increasingly adopted in critical scientific fields like weather forecasting and fluid dynamics. These methods can fail on out-of-distribution (OOD) data, but detecting such failures in regression tasks is an open challenge. We propose a new OOD detection method based on estimating joint likelihoods using a score-based diffusion model. This approach considers not just the input but also the regression model's prediction, providing a task-aware reliability score. Across numerous scientific datasets, including PDE datasets, satellite imagery and brain tumor segmentation, we show that this likelihood strongly correlates with prediction error. Our work provides a foundational step towards building a verifiable 'certificate of trust', thereby offering a practical tool for assessing the trustworthiness of AI-based scientific predictions. Our code is publicly available at https://github.com/bogdanraonic3/OOD_Detection_ScientificML
Problem

Research questions and friction points this paper is trying to address.

Detecting out-of-distribution failures in regression tasks
Providing task-aware reliability scores for AI predictions
Assessing trustworthiness of data-driven scientific models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses score-based diffusion model for OOD detection
Estimates joint likelihoods of inputs and predictions
Provides task-aware reliability scores for predictions