Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback

📅 2025-03-15

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This paper studies the bi-objective online optimization problem in combinatorial multi-armed bandits (CMAB) with bandit feedback: simultaneously minimizing cumulative regret and constraint violation (CCV). Existing approaches rely on problem-specific structural assumptions and struggle to balance both objectives. We propose the first generic framework that black-box converts any δ-elastic offline approximation algorithm into an online CMAB algorithm. Our framework integrates N oracle calls with an adaptive sampling strategy, requires no structural assumptions on the underlying optimization problem, and achieves the first joint sublinear upper bound of $O(delta^{2/3} N^{1/3} T^{2/3} log^{1/3} T)$ on both regret and CCV. We validate its effectiveness on submodular cover, submodular cost-cover, and fair submodular maximization tasks. The framework provides a transferable theoretical and algorithmic foundation for constrained combinatorial online learning.

Technology Category

Application Category

📝 Abstract

In this paper, we study bi-criteria optimization for combinatorial multi-armed bandits (CMAB) with bandit feedback. We propose a general framework that transforms discrete bi-criteria offline approximation algorithms into online algorithms with sublinear regret and cumulative constraint violation (CCV) guarantees. Our framework requires the offline algorithm to provide an $(alpha, eta)$-bi-criteria approximation ratio with $delta$-resilience and utilize $ exttt{N}$ oracle calls to evaluate the objective and constraint functions. We prove that the proposed framework achieves sub-linear regret and CCV, with both bounds scaling as ${O}left(delta^{2/3} exttt{N}^{1/3}T^{2/3}log^{1/3}(T) ight)$. Crucially, the framework treats the offline algorithm with $delta$-resilience as a black box, enabling flexible integration of existing approximation algorithms into the CMAB setting. To demonstrate its versatility, we apply our framework to several combinatorial problems, including submodular cover, submodular cost covering, and fair submodular maximization. These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.

Problem

Research questions and friction points this paper is trying to address.

Bi-criteria optimization for combinatorial bandits

Sublinear regret and constraint violation guarantees

Transforming offline algorithms into online frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms offline algorithms to online with sublinear regret

Achieves sub-linear regret and constraint violation guarantees

Treats offline algorithms as black boxes for flexibility

🔎 Similar Papers

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit