Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the intractability of conventional optimal experimental design in multinomial logit (MNL) bandits with combinatorial action spaces. The authors propose an efficient optimal design framework that reformulates the MNL optimal design problem as a 0–1 mixed-integer linear program equipped with a solver-certified early stopping mechanism. By introducing a surrogate objective function computable in polynomial time, they establish, for the first time, near-G-optimality guarantees and a theoretical characterization of the statistical–computational trade-off. When applied to the task of identifying the optimal assortment, the method achieves an instance-dependent sample complexity of Õ(d log N / Δ²), where d denotes the feature dimension, N the total number of items, and Δ the minimum reward gap, thereby balancing statistical efficiency with computational scalability.

📝 Abstract

We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear bandits, MNL bandits have a combinatorial action space, which makes classical optimal design approaches and naive optimization over all subsets computationally intractable. We propose a computationally efficient optimal design framework for MNL models that achieves both statistical efficiency and scalability through two complementary approaches: (i) an exact or certified-approximate reformulation of the design oracle as a $0$-$1$ mixed-integer linear program (MILP) with solver-certified early stopping, and (ii) a fully polynomial-time lifted design that replaces the nonlinear objective with a tractable surrogate. Using the Kiefer-Wolfowitz equivalence theorem, we establish near G-optimality guarantees and characterize the induced statistical-computational trade-offs. As an application, we develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, and prove an instance-dependent sample complexity of $\tilde{O}\big(\frac{d \log N}{Δ^2}\big)$, where $d$ is the feature dimension, $N$ is the number of arms, and $Δ$ is the minimum revenue gap.

Problem

Research questions and friction points this paper is trying to address.

multinomial logit

optimal experimental design

combinatorial action space

best assortment identification

MNL bandits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multinomial Logit Bandits

Optimal Experimental Design

Mixed-Integer Linear Programming