🤖 AI Summary
This paper addresses the problem of designing dimension-free optimal policies for contextual dynamic pricing under linear demand models, focusing on the exploration-exploitation trade-off. We propose the Local Exploration-then-Commit (LetC) algorithm, which adaptively adjusts prices in three phases: pure exploration, neighborhood refinement, and pure exploitation. Our contributions are threefold: (i) we establish the first dimension-free minimax-optimal regret bound for contextual dynamic pricing; (ii) we develop a unified theoretical framework that characterizes exploration-exploitation balance across the entire time horizon; and (iii) we derive a novel critical inequality that captures the fundamental trade-off inherent in dynamic pricing. Methodologically, LetC integrates phased adaptive exploration, local neighborhood refinement, and a regularized regression–inspired analysis. Theoretically, it achieves optimal regret when the time horizon exceeds a polynomial function of the covariate dimension. Extensive experiments on synthetic and real-world market data empirically validate its efficacy.
📝 Abstract
We study the problem of contextual dynamic pricing with a linear demand model. We propose a novel localized exploration-then-commit (LetC) algorithm which starts with a pure exploration stage, followed by a refinement stage that explores near the learned optimal pricing policy, and finally enters a pure exploitation stage. The algorithm is shown to achieve a minimax optimal, dimension-free regret bound when the time horizon exceeds a polynomial of the covariate dimension. Furthermore, we provide a general theoretical framework that encompasses the entire time spectrum, demonstrating how to balance exploration and exploitation when the horizon is limited. The analysis is powered by a novel critical inequality that depicts the exploration-exploitation trade-off in dynamic pricing, mirroring its existing counterpart for the bias-variance trade-off in regularized regression. Our theoretical results are validated by extensive experiments on synthetic and real-world data.