Can a Bayesian Oracle Prevent Harm from an Agent?

📅 2024-08-09
🏛️ Robotics
📈 Citations: 7
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of provably probabilistic safety guarantees in machine learning systems by proposing a Bayesian posterior-based runtime risk assessment framework. Methodologically, it derives, for the first time under an unknown true hypothesis, a tight, context-dependent upper bound on the probability of safety violation—distinguishing both i.i.d. and non-i.i.d. data regimes—and constructs dynamic action guards via hypothesis-space modeling and conservative optimization search. The contributions are threefold: (1) establishing the first theoretically sound and computationally tractable probabilistic risk upper bound; (2) enabling a high-confidence, scalable action filtering mechanism; and (3) providing a real-time safety verification foundation for trustworthy AI that balances mathematical rigor with engineering feasibility.

Technology Category

Application Category

📝 Abstract
Is there a way to design powerful AI systems based on machine learning methods that would satisfy probabilistic safety guarantees? With the long-term goal of obtaining a probabilistic guarantee that would apply in every context, we consider estimating a context-dependent bound on the probability of violating a given safety specification. Such a risk evaluation would need to be performed at run-time to provide a guardrail against dangerous actions of an AI. Noting that different plausible hypotheses about the world could produce very different outcomes, and because we do not know which one is right, we derive bounds on the safety violation probability predicted under the true but unknown hypothesis. Such bounds could be used to reject potentially dangerous actions. Our main results involve searching for cautious but plausible hypotheses, obtained by a maximization that involves Bayesian posteriors over hypotheses. We consider two forms of this result, in the i.i.d. case and in the non-i.i.d. case, and conclude with open problems towards turning such theoretical results into practical AI guardrails.
Problem

Research questions and friction points this paper is trying to address.

Design AI systems with probabilistic safety guarantees
Estimate context-dependent bounds on safety violation risks
Derive safety bounds under unknown true hypotheses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian posteriors for safety bounds
Context-dependent risk evaluation
Maximization for cautious hypotheses
🔎 Similar Papers
No similar papers found.
Y
Y. Bengio
Mila, Université de Montréal
M
Michael K. Cohen
University of California, Berkeley
Nikolay Malkin
Nikolay Malkin
University of Edinburgh
Matt MacDermott
Matt MacDermott
Imperial College London/Mila/LawZero
Artificial Intelligence
D
Damiano Fornasiere
Mila, Université de Montréal, Universitat de Barcelona
Pietro Greiner
Pietro Greiner
PhD student, Mila
AI safetyProbabilistic ML
Y
Younesse Kaddar
University of Oxford