PrivaDE: Privacy-preserving Data Evaluation for Blockchain-based Data Marketplaces

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

In blockchain-based data markets, model builders must assess the utility of external data without revealing proprietary model details, while data providers must conceal raw data yet disclose only utility scores—posing a dual privacy challenge. Method: We propose the first blockchain-enabled, bidirectionally private data utility evaluation framework, integrating model distillation, secure model partitioning, and cut-and-choose zero-knowledge proofs, all anchored by blockchain consensus and immutability to realize an efficient, secure multi-party computation protocol. A unified, interpretable utility scoring function is designed, natively supporting active learning. Contribution/Results: Experiments demonstrate that our system completes online utility evaluation for million-parameter models within 15 minutes, achieving both high efficiency and strong cryptographic security. This framework establishes critical infrastructure for automated, trustworthy data trading in decentralized machine learning environments.

Technology Category

Application Category

📝 Abstract

Evaluating the relevance of data is a critical task for model builders seeking to acquire datasets that enhance model performance. Ideally, such evaluation should allow the model builder to assess the utility of candidate data without exposing proprietary details of the model. At the same time, data providers must be assured that no information about their data - beyond the computed utility score - is disclosed to the model builder. In this paper, we present PrivaDE, a cryptographic protocol for privacy-preserving utility scoring and selection of data for machine learning. While prior works have proposed data evaluation protocols, our approach advances the state of the art through a practical, blockchain-centric design. Leveraging the trustless nature of blockchains, PrivaDE enforces malicious-security guarantees and ensures strong privacy protection for both models and datasets. To achieve efficiency, we integrate several techniques - including model distillation, model splitting, and cut-and-choose zero-knowledge proofs - bringing the runtime to a practical level. Furthermore, we propose a unified utility scoring function that combines empirical loss, predictive entropy, and feature-space diversity, and that can be seamlessly integrated into active-learning workflows. Evaluation shows that PrivaDE performs data evaluation effectively, achieving online runtimes within 15 minutes even for models with millions of parameters. Our work lays the foundation for fair and automated data marketplaces in decentralized machine learning ecosystems.

Problem

Research questions and friction points this paper is trying to address.

Enables private utility assessment of datasets without exposing model details

Protects data provider confidentiality while computing machine learning utility scores

Provides efficient blockchain-based protocol for secure data marketplace evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Blockchain-centric cryptographic protocol for privacy

Model distillation and splitting for efficiency

Unified utility scoring function combining multiple metrics

🔎 Similar Papers

No similar papers found.