Bandit Guided Submodular Curriculum for Adaptive Subset Selection

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Conventional curriculum learning suffers from the difficulty of precisely defining sample hardness. Method: This paper proposes an adaptive curriculum learning framework integrating multi-armed bandits (MAB) and submodular optimization, modeling subset selection as an online sequential decision problem. We design ONLINESUBMOD, an online greedy algorithm that leverages utility-driven validation feedback as a dynamic reward signal to achieve no-regret curriculum orchestration. Crucially, our approach abandons predefined hardness assumptions, instead using submodular functions to capture sample diversity and representativeness, and unifying online learning with multi-strategy sampling for difficulty-aware adaptive selection. Results: Extensive experiments on multiple vision and language benchmark datasets demonstrate that our method significantly outperforms conventional curriculum learning and bilevel optimization baselines, achieving superior trade-offs between accuracy and training efficiency.

Technology Category

Application Category

📝 Abstract

Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.

Problem

Research questions and friction points this paper is trying to address.

Reinterprets adaptive subset selection as a multi-armed bandit problem

Introduces an online greedy policy for optimizing utility-driven curriculum learning

Demonstrates superior performance over traditional curriculum and bi-level methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-armed bandit formulation for adaptive subset selection

Online greedy policy optimizing utility-driven reward

Validation-driven reward metrics guiding curriculum schedule

🔎 Similar Papers

Effective Subset Selection Through The Lens of Neural Network Pruning