๐ค AI Summary
This study addresses the online multi-objective resource selection problem, where an agent may probe $q$ candidate resources before execution but can commit to only oneโsituating the setting between the classical bandit and full-information expert models. To tackle this, the authors propose the PtC-P-UCB algorithm, which guides probing via a hypervolume-inspired frontier coverage potential and makes commitment decisions based on marginal hypervolume gain. The work establishes the first theoretical framework for multi-objective bandits under limited probing, revealing a $1/\sqrt{q}$ acceleration effect and extending the analysis to multimodal probing scenarios. The resulting bounds include a Pareto hypervolume error of $\widetilde{O}(KP d/\sqrt{qT})$ and a scalarized regret of $\widetilde{O}(L_\phi d\sqrt{(K/q)T})$, demonstrating that limited probing significantly enhances learning efficiency.
๐ Abstract
We study an online resource-selection problem motivated by multi-radio access selection and mobile edge computing offloading. In each round, an agent chooses among $K$ candidate links/servers (arms) whose performance is a stochastic $d$-dimensional vector (e.g., throughput, latency, energy, reliability). The key interaction is \emph{probe-then-commit (PtC)}: the agent may probe up to $q>1$ candidates via control-plane measurements to observe their vector outcomes, but must execute exactly one candidate in the data plane. This limited multi-arm feedback regime strictly interpolates between classical bandits ($q=1$) and full-information experts ($q=K$), yet existing multi-objective learning theory largely focuses on these extremes. We develop \textsc{PtC-P-UCB}, an optimistic probe-then-commit algorithm whose technical core is frontier-aware probing under uncertainty in a Pareto mode, e.g., it selects the $q$ probes by approximately maximizing a hypervolume-inspired frontier-coverage potential and commits by marginal hypervolume gain to directly expand the attained Pareto region. We prove a dominated-hypervolume frontier error of $\tilde{O} (K_P d/\sqrt{qT})$, where $K_P$ is the Pareto-frontier size and $T$ is the horizon, and scalarized regret $\tilde{O} (L_\phi d\sqrt{(K/q)T})$, where $\phi$ is the scalarizer. These quantify a transparent $1/\sqrt{q}$ acceleration from limited probing. We further extend to \emph{multi-modal probing}: each probe returns $M$ modalities (e.g., CSI, queue, compute telemetry), and uncertainty fusion yields variance-adaptive versions of the above bounds via an effective noise scale.