🤖 AI Summary
This work addresses the dynamic pricing problem under adversarial corruption, where existing methods suffer from regret bounds that couple the corruption level $C$ with the time horizon $T$, preventing optimality. The authors propose a novel robust binary search strategy integrated with an adaptive feedback handling mechanism, which effectively tolerates up to $C$ rounds of adversarial manipulation. When $C$ is known, the algorithm achieves a regret bound of $O(C + \log T)$; when $C$ is unknown, it attains $O(C + \log^2 T)$. This result is the first to fully decouple the dependence of regret on $C$ and $T$, resolving a long-standing open problem in the field and significantly improving upon the previous best-known bound of $O(C \log\log T)$.
📝 Abstract
We design the first regret guarantees for robust dynamic pricing that decouple the dependence on the corruption $C$ and the time horizon $T$. In dynamic pricing, a seller with unlimited supply of a good interacts with a stream of buyers over \( T \) rounds, with the goal of maximizing revenue. At each round $t$, the seller posts a price $p_t$, and the buyer purchases the good only if their unknown valuation $v^\star$ exceeds this price. The seller observes only the binary feedback $\mathbb{I} \left\{ p_t \leq v^\star \right\}$, indicating whether a sale occurred. In the \emph{robust} pricing setting, a malicious adversary is allowed to corrupt this feedback in at most $C$ rounds. Even if the learner knows the corruption $C$, the best known regret bound is $\mathcal{O}(C\log\log T)$ by Gupta et al. [2025]. This leaves as an open problem to ``decouple'' the dependence on $C$ and $T$. In this work, we resolve this open problem. In particular, we develop a robust variant of binary search that achieves regret $\mathcal{O}(C+\log T)$ when the corruption $C$ is known and $\mathcal{O}(C+\log^2 T)$ when the corruption is unknown.