Online Assortment and Price Optimization Under Contextual Choice Models

📅 2025-03-14

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This paper studies the contextual online joint optimization of assortment selection and pricing: in each round, a seller observes a $d$-dimensional user preference context and dynamically selects at most $K$ items and sets their prices; the user then chooses an item according to an unknown-context Multinomial Logit (MNL) model. We are the first to unify contextual modeling, assortment optimization, and price decision-making within a principled online learning framework, proposing an algorithm with theoretical optimality guarantees. Our method integrates contextual multi-armed bandits, generalized linear model estimation, and online MNL parameter learning, employing confidence-interval-driven optimistic planning. It achieves a near-tight cumulative regret bound of $ ilde{O}(dsqrt{KT}/L_0)$, matching the fundamental lower bound $Omega(dsqrt{T}/L_0)$. Experiments demonstrate significant improvements over existing baselines and rigorously quantify the impact of context dimension $d$, assortment size $K$, and price sensitivity $L_0$ on performance.

Technology Category

Application Category

📝 Abstract

We consider an assortment selection and pricing problem in which a seller has $N$ different items available for sale. In each round, the seller observes a $d$-dimensional contextual preference information vector for the user, and offers to the user an assortment of $K$ items at prices chosen by the seller. The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown. The seller observes which, if any, item is chosen at the end of each round, with the goal of maximizing cumulative revenue over a selling horizon of length $T$. For this problem, we propose an algorithm that learns from user feedback and achieves a revenue regret of order $widetilde{O}(d sqrt{K T} / L_0 )$ where $L_0$ is the minimum price sensitivity parameter. We also obtain a lower bound of order $Omega(d sqrt{T}/ L_0)$ for the regret achievable by any algorithm.

Problem

Research questions and friction points this paper is trying to address.

Optimize assortment and pricing for online sales

Learn user preferences from contextual data

Maximize revenue with minimal regret over time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm learns from user feedback

Optimizes assortment and pricing dynamically

Achieves low revenue regret bound

🔎 Similar Papers

A Unified Algorithmic Framework for Dynamic Assortment Optimization under MNL Choice