🤖 AI Summary
This work addresses the limitations of traditional equivalence query learning models, which rely on either unrealistic full-information feedback or overly pessimistic adversarial assumptions, rendering them ill-suited for iterative deployment in modern machine learning systems. To bridge this gap, the paper introduces a more natural environmental model—symmetric counterexample generators—that unifies both random and minimal counterexample mechanisms. The authors investigate its learnability under both full-information and bandit feedback settings. Leveraging game-theoretic perspectives, adaptive weighting, and minimax analysis, they establish, for the first time, a tight upper bound on the number of learning rounds required under bandit feedback. Matching upper and lower bounds are provided for both feedback regimes, thereby offering a rigorous theoretical foundation for interactive model updating.
📝 Abstract
Modern machine learning systems, such as generative models and recommendation systems, often evolve through a cycle of deployment, user interaction, and periodic model updates. This differs from standard supervised learning frameworks, which focus on loss or regret minimization over a fixed sequence of prediction tasks. Motivated by this setting, we revisit the classical model of learning from equivalence queries, introduced by Angluin (1988). In this model, a learner repeatedly proposes hypotheses and, when a deployed hypothesis is inadequate, receives a counterexample. Under fully adversarial counterexample generation, however, the model can be overly pessimistic. In addition, most prior work assumes a \emph{full-information} setting, where the learner also observes the correct label of the counterexample, an assumption that is not always natural.
We address these issues by restricting the environment to a broad class of less adversarial counterexample generators, which we call \emph{symmetric}. Informally, such generators choose counterexamples based only on the symmetric difference between the hypothesis and the target. This class captures natural mechanisms such as random counterexamples (Angluin and Dohrn, 2017; Bhatia, 2021; Chase, Freitag, and Reyzin, 2024), as well as generators that return the simplest counterexample according to a prescribed complexity measure. Within this framework, we study learning from equivalence queries under both full-information and bandit feedback. We obtain tight bounds on the number of learning rounds in both settings and highlight directions for future work. Our analysis combines a game-theoretic view of symmetric adversaries with adaptive weighting methods and minimax arguments.