🤖 AI Summary
Existing AI external evaluation frameworks neglect evaluator privacy—particularly test-set confidentiality—undermining assessment fairness and data integrity when models are unidirectionally disclosed. This paper formally defines the “bidirectional privacy” problem, asserting that both model developer privacy (e.g., model parameters) and evaluator privacy (e.g., proprietary test samples) must be equally protected. We propose a privacy-preserving evaluation protocol integrating secure multi-party computation, zero-knowledge proofs, and trusted execution environments (TEEs), enabling verifiable performance assessment without revealing model weights or test inputs. We establish mutual privacy as a necessary condition for trustworthy external evaluation, thereby providing both a theoretical foundation and a practical design paradigm for privacy-enhancing AI auditing standards and collaborative evaluation platforms.
📝 Abstract
The external evaluation of AI systems is increasingly recognised as a crucial approach for understanding their potential risks. However, facilitating external evaluation in practice faces significant challenges in balancing evaluators' need for system access with AI developers' privacy and security concerns. Additionally, evaluators have reason to protect their own privacy - for example, in order to maintain the integrity of held-out test sets. We refer to the challenge of ensuring both developers' and evaluators' privacy as one of providing mutual privacy. In this position paper, we argue that (i) addressing this mutual privacy challenge is essential for effective external evaluation of AI systems, and (ii) current methods for facilitating external evaluation inadequately address this challenge, particularly when it comes to preserving evaluators' privacy. In making these arguments, we formalise the mutual privacy problem; examine the privacy and access requirements of both model owners and evaluators; and explore potential solutions to this challenge, including through the application of cryptographic and hardware-based approaches.