Measuring the Hidden Cost of Data Valuation through Collective Disclosure

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a previously overlooked implicit acquisition cost in data valuation: conventional methods (e.g., Shapley value) allocate only marginal utility, neglecting the non-negligible collection and evaluation costs associated with zero-marginal-value data points. To address this, we formally define this implicit cost and propose a *disclosure game model* between data coalitions and consumers, analyzing how incremental disclosure strategies affect cost allocation under differential privacy constraints. Methodologically, we integrate the Laplace noise mechanism, Shapley value computation, and multi-armed bandit exploration to enable dynamic value estimation and strategic optimization. Experiments on the Yelp helpfulness prediction task demonstrate that data valuation indeed incurs substantial explicit acquisition costs; furthermore, coordinated disclosure policies reshape cost distribution, enhancing both fairness and efficiency across the coalition.

Technology Category

Application Category

📝 Abstract
Data valuation methods assign marginal utility to each data point that has contributed to the training of a machine learning model. If used directly as a payout mechanism, this creates a hidden cost of valuation, in which contributors with near-zero marginal value would receive nothing, even though their data had to be collected and assessed. To better formalize this cost, we introduce a conceptual and game-theoretic model, the Information Disclosure Game, between a Data Union (sometimes also called a data trust), a member-run agent representing contributors, and a Data Consumer (e.g., a platform). After first aggregating members' data, the DU releases information progressively by adding Laplacian noise under a differentially-private mechanism. Through simulations with strategies guided by data Shapley values and multi-armed bandit exploration, we demonstrate on a Yelp review helpfulness prediction task that data valuation inherently incurs an explicit acquisition cost and that the DU's collective disclosure policy changes how this cost is distributed across members.
Problem

Research questions and friction points this paper is trying to address.

Data valuation creates hidden costs for low-value contributors
Game-theoretic model analyzes collective data disclosure strategies
Differential privacy mechanisms redistribute data acquisition costs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces game-theoretic Information Disclosure Game model
Uses differentially-private Laplacian noise mechanism
Simulates strategies with Shapley values and bandit exploration
P
Patrick Mesana
HEC Montréal, Montréal, Québec, Canada
G
Gilles Caporossi
HEC Montréal, Montréal, Québec, Canada
Sébastien Gambs
Sébastien Gambs
Université du Québec à Montréal (UQAM)
PrivacySecurityEthics of AIMachine Learning