Why Neural Structural Obfuscation Can't Kill White-Box Watermarks for Good!

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Neural Structure Obfuscation (NSO) undermines the verifiability of white-box watermarks by injecting redundant neurons, posing a serious threat to existing watermarking schemes. This work models NSO as a graph consistency threat under a producer-consumer paradigm and introduces Canon, a novel framework that restores watermarks by detecting redundant channels and globally normalizing the layout of downstream consumer nodes based on signal propagation consistency. Canon is the first method to achieve full recovery against NSO attacks, attaining 100% watermark recovery even under strong composite and expansion attacks, while preserving the host model’s original task performance. Furthermore, it provides a unified mechanism for handling complex operations such as fan-out, addition, and concatenation during layout updates.

Technology Category

Application Category

📝 Abstract

Neural Structural Obfuscation (NSO) (USENIX Security'23) is a family of ``zero cost'' structure-editing transforms (\texttt{nso\_zero}, \texttt{nso\_clique}, \texttt{nso\_split}) that inject dummy neurons. By combining neuron permutation and parameter scaling, NSO makes a radical modification to the network structure and parameters while strictly preserving functional equivalence, thereby disrupting white-box watermark verification. This capability has been a fundamental challenge to the reliability of existing white-box watermarking schemes. We rethink NSO and, for the first time, fully recover from the damage it has caused. We redefine NSO as a graph-consistent threat model within a \textit{producer--consumer} paradigm. This formulation posits that any obfuscation of a producer node necessitates a compatible layout update in all downstream consumers to maintain structural integrity. Building on these consistency constraints on signal propagation, we present \textsc{Canon}, a recovery framework that probes the attacked model to identify redundancy/dummy channels and then \textit{globally} canonicalizes the network by rewriting \textit{all} downstream consumers by construction, synchronizing layouts across \texttt{fan-out}, \texttt{add}, and \texttt{cat}. Extensive experiments demonstrate that, even under strong composed and extended NSO attacks, \textsc{Canon} achieves \textbf{100\%} recovery success, restoring watermark verifiability while preserving task utility. Our code is available at https://anonymous.4open.science/r/anti-NSO-9874.

Problem

Research questions and friction points this paper is trying to address.

Neural Structural Obfuscation

white-box watermarking

functional equivalence

watermark verification

model obfuscation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Structural Obfuscation

white-box watermarking

graph-consistent threat model