Do Not Imitate, Reinforce: Iterative Classification via Belief Refinement

๐Ÿ“… 2026-04-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

199K/year
๐Ÿค– AI Summary
This work addresses the limitations of conventional supervised classification, which relies on single-pass imitation learning and struggles to adapt to varying input complexities, often resulting in overconfident predictions and inflexible computational resource allocation. To overcome these issues, the authors propose Reinforced Iterative Classification (RIC), reframing classification as a sequential decision-making problem. RIC employs a recurrent agent guided by reinforcement learning to iteratively refine its predictive distribution and uses a value function to dynamically determine when to terminate inference. This approach enables an anytime, adaptive computation mechanism that matches the accuracy of supervised baselines on standard image classification benchmarks while significantly improving prediction calibration and dynamically allocating computational resources according to input complexity.

Technology Category

Application Category

๐Ÿ“ Abstract
Standard supervised classification trains models to imitate the exact labels provided by a perfect oracle. This imitation happens in a single pass, restricting the model to a fixed compute budget even when inputs vary in complexity. Moreover, the rigid training objective forces the model to express absolute certainty on its training data, resulting in overconfident predictions during evaluation. We propose Reinforced Iterative Classification (RIC), which replaces the imitative objective with Reinforcement Learning (RL). RIC deploys a recurrent agent that iteratively updates a predictive distribution over classes, receiving reward for stepwise improvement in prediction quality. The value function provides a natural halting criterion by estimating the remaining scope for improvement. We prove that the iterative formulation recovers the same optimal predictions as cross-entropy while yielding an anytime classifier. On image classification benchmarks, RIC matches the accuracy of supervised baselines with improved calibration and learns to allocate computation adaptively across inputs.
Problem

Research questions and friction points this paper is trying to address.

supervised classification
overconfident predictions
fixed compute budget
imitation learning
prediction calibration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforced Iterative Classification
Reinforcement Learning
Anytime Classification
Adaptive Computation
Prediction Calibration
๐Ÿ”Ž Similar Papers