🤖 AI Summary
This study addresses semi-parametric contextual pricing under unknown utility functions and additive noise distributions. Leveraging a scalar index model, the authors establish that the oracle price mapping possesses $(\beta\!-\!1)$-order smoothness and introduce an adaptive ellipsoidal exploration mechanism that requires no assumptions on the context distribution. Their modular coarse-to-fine strategy, ORBIT, integrates local polynomial regression with confidence-region optimization to achieve minimax-optimal nonparametric learning. Under linear utility specifications, the algorithm attains a cumulative regret bound of $\widetilde{O}(T^{(2\beta-1)/(4\beta-3)} + \sqrt{dT})$, and a matching lower bound is established for fixed dimension $d$.
📝 Abstract
We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=\mu_\ast(\mathsf c_t)+\xi_t$, with an unknown utility map $\mu_\ast$ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map $u\mapsto p^\ast(u)$ induced by the scalar index $u=\mu_\ast(\mathsf c)$ and the noise tail. Under the $\beta$-H\"older smoothness of the tail function for $\beta\geq 2$ and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself $(\beta-1)$-smooth. We exploit such structure through $\mathsf{ORBIT}$, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model $\mu_\ast(\mathsf c)=\mathsf c^\top\theta_\ast$, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret $\widetilde{O}\big(T^{\frac{2\beta-1}{4\beta-3}}+\sqrt{dT}\big)$. For fixed $d$, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric H\"older utility.