๐ค AI Summary
This work addresses the joint assortment and pricing problem across multiple markets, where leveraging source-market data can introduce systematic bias due to heterogeneous consumer preferences. The authors propose the TJAP framework, which models cross-market preference heterogeneity through a structured utility shift and integrates an aggregation-debiasing estimator with a double-radius UCB strategy to simultaneously handle statistical uncertainty and transfer-induced bias. Theoretical analysis establishes that the method achieves a minimax-optimal regret bound of $\tilde{O}(d \sqrt{T/(1+H)} + s_0 \sqrt{T})$ under the varianceโbias trade-off. Empirical results demonstrate that TJAP significantly outperforms approaches relying solely on target-market data or naive pooling, while exhibiting robustness to cross-market discrepancies.
๐ Abstract
We study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While data from source markets can accelerate learning in a target market, cross-market differences in customer preferences may introduce systematic bias if pooled indiscriminately.
We model heterogeneity through a structured utility shift, where markets share a common contextual utility structure but differ along a sparse set of latent preference coordinates. Building on this, we develop Transfer Joint Assortment-Pricing (TJAP), a bias-aware framework that combines aggregate-then-debias estimation with a UCB-style policy. TJAP constructs two-radius confidence bounds that separately capture statistical uncertainty and transfer-induced bias, uniformly over continuous prices.
We establish matching minimax regret bounds of order $\tilde{O}\!\left(d\sqrt{\frac{T}{1+H}} + s_0\sqrt{T}\right),$revealing a transparent variance-bias tradeoff: transfer accelerates learning along shared preference directions, while heterogeneous components impose an irreducible adaptation cost. Numerical experiments corroborate the theory, showing that TJAP outperforms both target-only learning and naive pooling while remaining robust to cross-market differences.