π€ AI Summary
This work proposes a dynamic pricing framework that operates without assuming a parametric form of the demand function, under the challenging setting where only single-point revenue observations are available and market conditions evolve non-stationarily. The approach constructs a nonparametric gradient estimator from single-point revenue feedback to iteratively update prices and incorporates a restart mechanism to handle abrupt environmental shifts. When the degree of non-stationarity is unknown, a meta-learning layer is introduced to adaptively combine multiple restart strategies. Theoretical analysis establishes an upper bound on the cumulative revenue regret, and extensive experiments on both synthetic and real-world data demonstrate the methodβs effectiveness and robustness in non-stationary markets. This study represents the first integration of nonparametric learning, single-point feedback-based gradient estimation, and adaptive restarting, achieving provably sound performance guarantees.
π Abstract
Firms increasingly rely on dynamic pricing to respond to evolving customer demand, yet in many applications they observe only the revenue generated by a single posted price in each period. At the same time, market conditions may shift gradually or abruptly due to changes in customer preferences, competition, or external shocks. These features create two intertwined challenges: learning the revenue--demand relationship from limited feedback and adapting pricing decisions to a changing environment. We study how a seller can learn and earn effectively under these constraints, without assuming a specific parametric form for demand. We develop a learning framework that updates prices using revenue-based gradient approximations constructed from one observation per period. To address environmental changes, we incorporate a restarting mechanism that periodically refreshes the learning process so that outdated information is discounted. When the degree of nonstationarity is unknown, we further introduce a meta-learning layer to adaptively hedge across multiple restarting schedules. We provide performance guarantees for our approach, showing how cumulative revenue loss relative to a fully informed benchmark depends on both the time horizon and the magnitude of market variation. Simulation experiments using synthetic and real-world data illustrate the effectiveness of the proposed procedures.