🤖 AI Summary
This work proposes a sharded collaborative framework to ensure model security and the effectiveness of incentive mechanisms in decentralized training and inference. By periodically injecting time-varying, invertible random linear transformations at participant boundaries, the method renders model shards mutually incompatible across timesteps, thereby preventing reconstruction of the full model. Notably, it achieves the first-ever guarantee of non-materializability—original model weights cannot be recovered even if all shards are collected—while preserving full model functionality and enabling programmable incentives. Experiments on Qwen-2.5-0.5B and Llama-3.2-1B show that after 10,000 transformations, perplexity changes remain below 0.01; inference incurs only 3% latency overhead, 0.1% additional bandwidth, and 10% extra GPU memory; training overhead is 1.6% in time and under 1% in memory. Attack evaluations confirm the model’s robustness against reassembly or extraction attempts.
📝 Abstract
We consider a decentralized setup in which the participants collaboratively train and serve a large neural network, and where each participant only processes a subset of the model. In this setup, we explore the possibility of unmaterializable weights, where a full weight set is never available to any one participant. We introduce Unextractable Protocol Models (UPMs): a training and inference framework that leverages the sharded model setup to ensure model shards (i.e., subsets) held by participants are incompatible at different time steps. UPMs periodically inject time-varying, random, invertible transforms at participant boundaries; preserving the overall network function yet rendering cross-time assemblies incoherent. On Qwen-2.5-0.5B and Llama-3.2-1B, 10,000 transforms leave FP32 perplexity unchanged ($Δ$PPL $< 0.01$; Jensen-Shannon drift $< 4 \times 10^{-5}$), and we show how to control growth for lower precision datatypes. Applying a transform every 30s adds 3% latency, 0.1% bandwidth, and 10% GPU-memory overhead at inference, while training overhead falls to 1.6% time and $< 1$% memory. We consider several attacks, showing that the requirements of direct attacks are impractical and easy to defend against, and that gradient-based fine-tuning of stitched partitions consumes $\geq 60$% of the tokens required to train from scratch. By enabling models to be collaboratively trained yet not extracted, UPMs make it practical to embed programmatic incentive mechanisms in community-driven decentralized training.