Bandit Guided Submodular Curriculum for Adaptive Subset Selection

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional curriculum learning suffers from the difficulty of precisely defining sample hardness. Method: This paper proposes an adaptive curriculum learning framework integrating multi-armed bandits (MAB) and submodular optimization, modeling subset selection as an online sequential decision problem. We design ONLINESUBMOD, an online greedy algorithm that leverages utility-driven validation feedback as a dynamic reward signal to achieve no-regret curriculum orchestration. Crucially, our approach abandons predefined hardness assumptions, instead using submodular functions to capture sample diversity and representativeness, and unifying online learning with multi-strategy sampling for difficulty-aware adaptive selection. Results: Extensive experiments on multiple vision and language benchmark datasets demonstrate that our method significantly outperforms conventional curriculum learning and bilevel optimization baselines, achieving superior trade-offs between accuracy and training efficiency.

Technology Category

Application Category

📝 Abstract
Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficulty scores in curriculum learning. We reinterpret adaptive subset selection and formulate it as a multi-armed bandit problem, where each arm corresponds to a submodular function guiding sample selection. We introduce ONLINESUBMOD, a novel online greedy policy that optimizes a utility-driven reward and provably achieves no-regret performance under various sampling regimes. Empirically, ONLINESUBMOD outperforms both traditional curriculum learning and bi-level optimization approaches across vision and language datasets, showing superior accuracy-efficiency tradeoffs. More broadly, we show that validationdriven reward metrics offer a principled way to guide the curriculum schedule.
Problem

Research questions and friction points this paper is trying to address.

Reinterprets adaptive subset selection as a multi-armed bandit problem
Introduces an online greedy policy for optimizing utility-driven curriculum learning
Demonstrates superior performance over traditional curriculum and bi-level methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-armed bandit formulation for adaptive subset selection
Online greedy policy optimizing utility-driven reward
Validation-driven reward metrics guiding curriculum schedule
P
Prateek Chanda
Department of Computer Science and Engineering, Indian Institute of Technology Bombay
P
Prayas Agrawal
Department of Computer Science and Engineering, Indian Institute of Technology Bombay
S
Saral Sureka
Department of Computer Science and Engineering, Indian Institute of Technology Bombay
L
Lokesh Reddy Polu
Department of Computer Science and Engineering, Indian Institute of Technology Bombay
A
Atharv Kshirsagar
Department of Computer Science and Engineering, Indian Institute of Technology Bombay
Ganesh Ramakrishnan
Ganesh Ramakrishnan
Professor, Department of Computer Science and Engineering, Indian Institute of Technology Bombay
Machine LearningRelational LearningInformation ExtractionQuestion AnsweringText Analytics