Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

📅 2024-10-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This paper studies online unconstrained non-monotone submodular maximization under stochastic bandit feedback, where rewards follow a bounded-noise distribution. To address the limitations of existing methods—namely, overly loose pseudo-regret bounds and absence of problem-dependent hardness characterization—we propose DG-ETC, a novel algorithm integrating a double-greedy offline framework, an explore-then-commit mechanism, and gradient estimation techniques. We introduce the first problem-dependent hardness measure for this setting and unify logarithmic and subpolynomial pseudo-regret guarantees. Our theoretical analysis establishes a problem-dependent bound of $O(d log(dT))$ and a problem-independent bound of $O(d T^{2/3} log^{1/3}(dT))$, both significantly improving upon prior results and breaking a longstanding performance bottleneck in stochastic non-monotone submodular optimization.

Technology Category

Application Category

📝 Abstract

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(dlog(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.

Problem

Research questions and friction points this paper is trying to address.

Online unconstrained submodular maximization

Stochastic bandit feedback

Double-Greedy - Explore-then-Commit algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Double-Greedy - Explore-then-Commit

non-monotone submodular function

logarithmic regret optimization

🔎 Similar Papers

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit