Data Overvaluation Attack and Truthful Data Valuation

📅 2025-02-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In collaborative learning, data providers may maliciously overstate the value of their data, undermining fair valuation and honest contribution. Method: This paper formally defines the “data overstatement attack” and proposes Truth-Shapley, an incentive-compatible data valuation method rooted in Shapley value theory. Truth-Shapley is the unique Shapley-type mechanism satisfying symmetry, null-player, additivity, and truthfulness axioms—ensuring truthful reporting of data contribution is a strictly dominant strategy for clients. Contribution/Results: Grounded in game-theoretic mechanism design and axiomatic analysis, extensive adversarial experiments across multiple tasks and datasets demonstrate that mainstream valuation methods—including Leave-one-out and TMC-Shapley—suffer significant distortion under overstatement attacks. In contrast, Truth-Shapley achieves substantial improvements in both robustness and fairness, maintaining accurate and equitable valuations even under strategic manipulation.

Technology Category

Application Category

📝 Abstract
In collaborative machine learning, data valuation, i.e., evaluating the contribution of each client' data to the machine learning model, has become a critical task for incentivizing and selecting positive data contributions. However, existing studies often assume that clients engage in data valuation truthfully, overlooking the practical motivation for clients to exaggerate their contributions. To unlock this threat, this paper introduces the first data overvaluation attack, enabling strategic clients to have their data significantly overvalued. Furthermore, we propose a truthful data valuation metric, named Truth-Shapley. Truth-Shapley is the unique metric that guarantees some promising axioms for data valuation while ensuring that clients' optimal strategy is to perform truthful data valuation. Our experiments demonstrate the vulnerability of existing data valuation metrics to the data overvaluation attack and validate the robustness and effectiveness of Truth-Shapley.
Problem

Research questions and friction points this paper is trying to address.

Collaborative Machine Learning
Data Valuation
Fairness in Data Pricing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data Value Overestimation Attack
Truth-Shapley Method
Honest Data Assessment
🔎 Similar Papers
No similar papers found.
Shuyuan Zheng
Shuyuan Zheng
The University of Osaka
Data ValuationData SecurityLegal AI
S
Sudong Cai
Beijing Institute of Technology
Chuan Xiao
Chuan Xiao
Associate Professor, Osaka University
Agent-Based ModelingComputer SimulationData PreprocessingData ManagementData Science
Y
Yang Cao
Institute of Science Tokyo
J
Jainbin Qin
Beijing Institute of Technology
Masatoshi Yoshikawa
Masatoshi Yoshikawa
Osaka Seikei University
M
Makoto Onizuka
Osaka University