EmbTracker: Traceable Black-box Watermarking for Federated Language Models

๐Ÿ“… 2026-03-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

189K/year
๐Ÿค– AI Summary
This work addresses the critical vulnerability of federated language models to malicious client-side model leakage, a risk exacerbated by existing watermarking approaches that lack individual-level traceability and require white-box access or active client cooperation. To overcome these limitations, we propose EmbTrackerโ€”the first server-side, black-box watermarking framework enabling individual-level attribution. EmbTracker embeds unique identity markers into each clientโ€™s model during distribution, allowing precise identification of the source of any leaked model through simple API queries, without requiring client involvement. The approach integrates identity-specific embedding strategies, a backdoor-triggered black-box watermarking mechanism, and a federated model distribution protocol with built-in verification. Experiments demonstrate near-perfect (โ‰ˆ100%) tracing accuracy across diverse language and vision-language models, strong robustness against fine-tuning, pruning, and quantization attacks, and minimal utility degradation of only 1โ€“2% on the primary task.

Technology Category

Application Category

๐Ÿ“ Abstract
Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).
Problem

Research questions and friction points this paper is trying to address.

Federated Language Model
Model Leakage
Client-level Traceability
Black-box Watermarking
Ownership Verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

black-box watermarking
federated language models
client-level traceability
backdoor-based watermark
model ownership verification
๐Ÿ”Ž Similar Papers