🤖 AI Summary
This work addresses the challenge of simultaneously achieving verifiability and privacy in Shapley value computation within data markets by proposing the first end-to-end zero-knowledge data valuation system. Through co-design of the valuation algorithm and proof protocol, the authors introduce the LSH-Shapley valuation primitive and the ZK-LSH-Shapley proving protocol, enhanced with optimizations including locality-sensitive hashing, bucket-level histogram encoding, hyper-oracle batching, and sparsity-aware skipping. These techniques substantially reduce proof complexity without revealing the verification queue. Experimental results across twelve benchmark datasets demonstrate that the method achieves valuation accuracy nearly matching exact KNN-Shapley (AUROC gap ≤ 0.033), generates proofs in seconds to minutes—12.6–68.1× faster than existing zero-knowledge approaches—and maintains verification times under 4.6 seconds.
📝 Abstract
Data valuation is a foundational task in data marketplaces, where a Shapley-value attribution determines how a buyer's payment is distributed among data providers. Typically, the marketplace operator runs this attribution alone, requiring participants and external auditors to trust scores they cannot independently recompute on the underlying private data. While zero-knowledge proofs (ZKPs) can theoretically reconcile this conflict between privacy and verifiability, existing ZK valuation systems fail to scale to real-world marketplace demands due to prohibitive proving times or the requirement to disclose validation cohorts.
We present ZK-Value, a practical, end-to-end ZK data-valuation system. Our solution bridges the scalability gap through a fully co-designed architecture: (1) LSH-Shapley, a locality-based valuation primitive that replaces expensive pairwise distance metrics with per-bucket collision counts; (2) ZK-LSH-Shapley, a tailored ZKP protocol that drastically reduces witness size by encoding these counts into bucket-level histograms rather than naive per-pair tensors; and (3) structural proof-system optimizations, specifically super-oracle batching and sparsity skipping. Evaluated across 12 standard datasets, ZK-Value delivers valuation quality on par with state-of-the-art baselines (within 0.033 AUROC of exact KNN-Shapley), while generating proofs in seconds to minutes and outperforming specialized ZK baselines by 12.6x to 68.1x in proving time, with verification in under 4.6 s.