🤖 AI Summary
This paper studies online unconstrained non-monotone submodular maximization under stochastic bandit feedback, where rewards follow a bounded-noise distribution. To address the limitations of existing methods—namely, overly loose pseudo-regret bounds and absence of problem-dependent hardness characterization—we propose DG-ETC, a novel algorithm integrating a double-greedy offline framework, an explore-then-commit mechanism, and gradient estimation techniques. We introduce the first problem-dependent hardness measure for this setting and unify logarithmic and subpolynomial pseudo-regret guarantees. Our theoretical analysis establishes a problem-dependent bound of $O(d log(dT))$ and a problem-independent bound of $O(d T^{2/3} log^{1/3}(dT))$, both significantly improving upon prior results and breaking a longstanding performance bottleneck in stochastic non-monotone submodular optimization.
📝 Abstract
We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(dlog(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.