π€ AI Summary
Existing black-box AI model watermarking schemes are vulnerable to adversarial evidence forgery attacks, undermining copyright protection. Method: This paper proposes a self-authenticating black-box watermarking protocol featuring (i) a novel hash-driven self-authentication mechanism that explicitly models adversarial perturbations for enhanced robustness; (ii) a purification-agnostic curriculum-based proxy learning framework that decouples watermark embedding from model purification dependencies; and (iii) lightweight, efficient embedding via proxy network distillation. Contributions/Results: We identify a new paradigm of evidence forgery attacks; achieve >92% watermark survival rate under multiple adversarial attacks; incur <0.8% accuracy degradation on downstream tasks; and empirically validate the schemeβs reliability, auditability, and legal admissibility in copyright attribution.
π Abstract
With the proliferation of AI agents in various domains, protecting the ownership of AI models has become crucial due to the significant investment in their development. Unauthorized use and illegal distribution of these models pose serious threats to intellectual property, necessitating effective copyright protection measures. Model watermarking has emerged as a key technique to address this issue, embedding ownership information within models to assert rightful ownership during copyright disputes. This paper presents several contributions to model watermarking: a self-authenticating black-box watermarking protocol using hash techniques, a study on evidence forgery attacks using adversarial perturbations, a proposed defense involving a purification step to counter adversarial attacks, and a purification-agnostic curriculum proxy learning method to enhance watermark robustness and model performance. Experimental results demonstrate the effectiveness of these approaches in improving the security, reliability, and performance of watermarked models.