Learning Multinomial Logits in $O(n \log n)$ time

📅 2026-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of efficiently learning the weight vector of a multinomial logit model such that, for any subset of alternatives, the predicted choice distribution is within total variation distance ε of the true distribution. The authors propose both adaptive and non-adaptive algorithms that rely solely on queries to subsets of size two, operating in the conditional sampling oracle model to reconstruct the full weight vector. Their main contributions include achieving the first optimal query complexity of \(O(n/\varepsilon^3 \log n)\) in the adaptive setting and a nearly tight upper bound of \(O(n^2/\varepsilon^3 \log n \log(n/\varepsilon))\) in the non-adaptive setting. Furthermore, they establish information-theoretic lower bounds of \(\Omega(n/\varepsilon^2 \log n)\) and \(\Omega(n^2/\varepsilon^2 \log n)\) for the adaptive and non-adaptive cases, respectively, thereby demonstrating the near-optimality of their algorithms.

Technology Category

Application Category

📝 Abstract
A Multinomial Logit (MNL) model is composed of a finite universe of items $[n]=\{1,..., n\}$, each assigned a positive weight. A query specifies an admissible subset -- called a slate -- and the model chooses one item from that slate with probability proportional to its weight. This query model is also known as the Plackett-Luce model or conditional sampling oracle in the literature. Although MNLs have been studied extensively, a basic computational question remains open: given query access to slates, how efficiently can we learn weights so that, for every slate, the induced choice distribution is within total variation distance $\varepsilon$ of the ground truth? This question is central to MNL learning and has direct implications for modern recommender system interfaces. We provide two algorithms for this task, one with adaptive queries and one with non-adaptive queries. Each algorithm outputs an MNL $M'$ that induces, for each slate $S$, a distribution $M'_S$ on $S$ that is within $\varepsilon$ total variation distance of the true distribution. Our adaptive algorithm makes $O\left(\frac{n}{\varepsilon^{3}}\log n\right)$ queries, while our non-adaptive algorithm makes $O\left(\frac{n^{2}}{\varepsilon^{3}}\log n \log\frac{n}{\varepsilon}\right)$ queries. Both algorithms query only slates of size two and run in time proportional to their query complexity. We complement these upper bounds with lower bounds of $\Omega\left(\frac{n}{\varepsilon^{2}}\log n\right)$ for adaptive queries and $\Omega\left(\frac{n^{2}}{\varepsilon^{2}}\log n\right)$ for non-adaptive queries, thus proving that our adaptive algorithm is optimal in its dependence on the support size $n$, while the non-adaptive one is tight within a $\log n$ factor.
Problem

Research questions and friction points this paper is trying to address.

Multinomial Logit
total variation distance
query complexity
Plackett-Luce model
conditional sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multinomial Logit
adaptive queries
query complexity
total variation distance
Plackett-Luce model
🔎 Similar Papers
No similar papers found.