Online Nonsubmodular Optimization with Delayed Feedback in the Bandit Setting

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
研究在线非子模优化问题,针对延迟反馈和bandit设置下的性能受限问题,提出DBGD-NF算法和基于分块更新的改进方法,分别优化了平均延迟和最大延迟的影响,提升了收敛效率。

Technology Category

Application Category

📝 Abstract
We investigate the online nonsubmodular optimization with delayed feedback in the bandit setting, where the loss function is $α$-weakly DR-submodular and $β$-weakly DR-supermodular. Previous work has established an $(α,β)$-regret bound of $mathcal{O}(nd^{1/3}T^{2/3})$, where $n$ is the dimensionality and $d$ is the maximum delay. However, its regret bound relies on the maximum delay and is thus sensitive to irregular delays. Additionally, it couples the effects of delays and bandit feedback as its bound is the product of the delay term and the $mathcal{O}(nT^{2/3})$ regret bound in the bandit setting without delayed feedback. In this paper, we develop two algorithms to address these limitations, respectively. Firstly, we propose a novel method, namely DBGD-NF, which employs the one-point gradient estimator and utilizes all the available estimated gradients in each round to update the decision. It achieves a better $mathcal{O}(nar{d}^{1/3}T^{2/3})$ regret bound, which is relevant to the average delay $ar{d} = frac{1}{T}sum_{t=1}^T d_tleq d$. Secondly, we extend DBGD-NF by employing a blocking update mechanism to decouple the joint effect of the delays and bandit feedback, which enjoys an $mathcal{O}(n(T^{2/3} + sqrt{dT}))$ regret bound. When $d = mathcal{O}(T^{1/3})$, our regret bound matches the $mathcal{O}(nT^{2/3})$ bound in the bandit setting without delayed feedback. Compared to our first $mathcal{O}(nar{d}^{1/3}T^{2/3})$ bound, it is more advantageous when the maximum delay $d = o(ar{d}^{2/3}T^{1/3})$. Finally, we conduct experiments on structured sparse learning to demonstrate the superiority of our methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizing nonsubmodular functions with delayed feedback
Improving regret bounds for irregular delay scenarios
Decoupling delay and bandit feedback effects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses one-point gradient estimator for feedback
Employs blocking update to decouple delays
Optimizes regret bound with average delay
🔎 Similar Papers
No similar papers found.
Sifan Yang
Sifan Yang
Nanjing University
machine learningoptimization
Yuanyu Wan
Yuanyu Wan
Zhejiang University
Machine LearningOnline LearningDistributed Optimization
L
Lijun Zhang
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China; State Key Laboratory of Blockchain and Data Security, Zhejiang University, Hangzhou, China