Avoiding Catastrophe in Online Learning by Asking for Help

📅 2024-02-12

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This paper addresses the problem of minimizing the probability of irreversible catastrophic errors in online learning under a bounded number of oracle queries. To capture the essence of “catastrophe avoidance”—which conventional objective functions fail to model—we formalize the goal as maximizing the product of per-round avoidance probabilities. Theoretically, we prove that, under asymptotically sparse querying (i.e., sublinear total query budget), the catastrophe probability converges to zero while preserving learnability. Our proposed algorithm achieves dual optimality: (i) cumulative regret and (ii) per-unit-time query rate both vanish asymptotically. Moreover, under standard realizability assumptions on the policy class, it attains safe and efficient learning with near-zero catastrophe probability.

Technology Category

Application Category

📝 Abstract

Most learning algorithms with formal regret guarantees assume that all mistakes are recoverable and essentially rely on trying all possible behaviors. This approach is problematic when some mistakes are emph{catastrophic}, i.e., irreparable. We propose an online learning problem where the goal is to minimize the chance of catastrophe. Specifically, we assume that the payoff in each round represents the chance of avoiding catastrophe that round and try to maximize the product of payoffs (the overall chance of avoiding catastrophe) while allowing a limited number of queries to a mentor. We first show that in general, any algorithm either constantly queries the mentor or is nearly guaranteed to cause catastrophe. However, in settings where the mentor policy class is learnable in the standard online model, we provide an algorithm whose regret and rate of querying the mentor both approach 0 as the time horizon grows. Conceptually, if a policy class is learnable in the absence of catastrophic risk, it is learnable in the presence of catastrophic risk if the agent can ask for help.

Problem

Research questions and friction points this paper is trying to address.

Minimizing catastrophic irreparable mistakes in online learning

Maximizing overall chance of avoiding catastrophe through limited mentor queries

Learning safely with catastrophic risks by asking for help

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses mentor queries to avoid catastrophic mistakes online

Maximizes product of payoffs for overall safety guarantee

Achieves vanishing regret and query rates with learnable policies

🔎 Similar Papers

VCEval: Rethinking What is a Good Educational Video and How to Automatically Evaluate It