Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses semi-parametric contextual pricing under unknown utility functions and additive noise distributions. Leveraging a scalar index model, the authors establish that the oracle price mapping possesses $(\beta\!-\!1)$-order smoothness and introduce an adaptive ellipsoidal exploration mechanism that requires no assumptions on the context distribution. Their modular coarse-to-fine strategy, ORBIT, integrates local polynomial regression with confidence-region optimization to achieve minimax-optimal nonparametric learning. Under linear utility specifications, the algorithm attains a cumulative regret bound of $\widetilde{O}(T^{(2\beta-1)/(4\beta-3)} + \sqrt{dT})$, and a matching lower bound is established for fixed dimension $d$.

📝 Abstract

We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=\mu_\ast(\mathsf c_t)+\xi_t$, with an unknown utility map $\mu_\ast$ and an unknown additive noise distribution. The key decision object is the one-dimensional oracle price map $u\mapsto p^\ast(u)$ induced by the scalar index $u=\mu_\ast(\mathsf c)$ and the noise tail. Under the $\beta$-H\"older smoothness of the tail function for $\beta\geq 2$ and a revenue-geometry condition that gives a unique, stable, interior maximizer, this oracle map is itself $(\beta-1)$-smooth. We exploit such structure through $\mathsf{ORBIT}$, a modular coarse-to-fine policy that takes a scalar pilot index as input, localizes a benchmark price in each active bin, and learns a local polynomial approximation of the oracle map inside a trust region via bandit convex optimization. For the baseline linear utility model $\mu_\ast(\mathsf c)=\mathsf c^\top\theta_\ast$, an adaptive elliptical exploration scheme constructs the required scalar pilot online without distributional assumptions on the contexts. The resulting policy achieves regret $\widetilde{O}\big(T^{\frac{2\beta-1}{4\beta-3}}+\sqrt{dT}\big)$. For fixed $d$, we establish a matching lower bound in the horizon dependence, unveiling that the nonparametric oracle-map learning term is minimax sharp. The same scalar-pilot interface also yields extensions to sparse high-dimensional linear utility and nonparametric H\"older utility.

Problem

Research questions and friction points this paper is trying to address.

contextual pricing

semiparametric model

oracle price map

unimodality

nonparametric learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

oracle price map

semiparametric pricing

unimodality