How to Sell High-Dimensional Data Optimally

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies how a monopolistic seller can design revenue-maximizing information structures for high-dimensional data when the buyer’s utility function is unknown. We propose a sample-efficient algorithm for constructing approximately optimal statistical experiment menus, whose sample complexity is independent of the data dimension. Theoretically, we show that for high-dimensional Gaussian data, scalar experiments suffice for optimality, and full surplus extraction is achievable if and only if the buyer’s utility satisfies a specific separability condition. Our approach integrates statistical experiment theory, semidefinite programming modeling, and sampling-based efficient optimization, thereby overcoming computational bottlenecks inherent in high-dimensional information structure design. Our results yield the first polynomial-time algorithm for computing optimal information menus under Gaussian models and provide the first necessary and sufficient characterization of full surplus extraction—establishing a new framework for pricing high-dimensional data that is both theoretically rigorous and computationally tractable.

Technology Category

Application Category

📝 Abstract
Motivated by the problem of selling large, proprietary data, we consider an information pricing problem proposed by Bergemann et al. that involves a decision-making buyer and a monopolistic seller. The seller has access to the underlying state of the world that determines the utility of the various actions the buyer may take. Since the buyer gains greater utility through better decisions resulting from more accurate assessments of the state, the seller can therefore promise the buyer supplemental information at a price. To contend with the fact that the seller may not be perfectly informed about the buyer's private preferences (or utility), we frame the problem of designing a data product as one where the seller designs a revenue-maximizing menu of statistical experiments. Prior work by Cai et al. showed that an optimal menu can be found in time polynomial in the state space, whereas we observe that the state space is naturally exponential in the dimension of the data. We propose an algorithm which, given only sampling access to the state space, provably generates a near-optimal menu with a number of samples independent of the state space. We then analyze a special case of high-dimensional Gaussian data, showing that (a) it suffices to consider scalar Gaussian experiments, (b) the optimal menu of such experiments can be found efficiently via a semidefinite program, and (c) full surplus extraction occurs if and only if a natural separation condition holds on the set of potential preferences of the buyer.
Problem

Research questions and friction points this paper is trying to address.

Designing revenue-maximizing data products for selling high-dimensional information
Developing efficient algorithms for optimal menus with exponential state spaces
Analyzing surplus extraction conditions in high-dimensional Gaussian data settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Algorithm uses sampling for near-optimal menu design
Scalar Gaussian experiments simplify high-dimensional data pricing
Semidefinite program efficiently finds optimal experiment menu
🔎 Similar Papers
No similar papers found.
A
Andrew Li
Tepper School of Business, Carnegie Mellon University, Pittsburgh
R
R. Ravi
Tepper School of Business, Carnegie Mellon University, Pittsburgh
K
Karan Singh
Tepper School of Business, Carnegie Mellon University, Pittsburgh
Z
Zihong Yi
Tepper School of Business, Carnegie Mellon University, Pittsburgh
Weizhong Zhang
Weizhong Zhang
Fudan University
Machine LearningDeep LearningOptimization