Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work proposes a sharded collaborative framework to ensure model security and the effectiveness of incentive mechanisms in decentralized training and inference. By periodically injecting time-varying, invertible random linear transformations at participant boundaries, the method renders model shards mutually incompatible across timesteps, thereby preventing reconstruction of the full model. Notably, it achieves the first-ever guarantee of non-materializability—original model weights cannot be recovered even if all shards are collected—while preserving full model functionality and enabling programmable incentives. Experiments on Qwen-2.5-0.5B and Llama-3.2-1B show that after 10,000 transformations, perplexity changes remain below 0.01; inference incurs only 3% latency overhead, 0.1% additional bandwidth, and 10% extra GPU memory; training overhead is 1.6% in time and under 1% in memory. Attack evaluations confirm the model’s robustness against reassembly or extraction attempts.

📝 Abstract

We consider a decentralized setup in which the participants collaboratively train and serve a large neural network, and where each participant only processes a subset of the model. In this setup, we explore the possibility of unmaterializable weights, where a full weight set is never available to any one participant. We introduce Unextractable Protocol Models (UPMs): a training and inference framework that leverages the sharded model setup to ensure model shards (i.e., subsets) held by participants are incompatible at different time steps. UPMs periodically inject time-varying, random, invertible transforms at participant boundaries; preserving the overall network function yet rendering cross-time assemblies incoherent. On Qwen-2.5-0.5B and Llama-3.2-1B, 10,000 transforms leave FP32 perplexity unchanged ($Δ$PPL $< 0.01$; Jensen-Shannon drift $< 4 \times 10^{-5}$), and we show how to control growth for lower precision datatypes. Applying a transform every 30s adds 3% latency, 0.1% bandwidth, and 10% GPU-memory overhead at inference, while training overhead falls to 1.6% time and $< 1$% memory. We consider several attacks, showing that the requirements of direct attacks are impractical and easy to defend against, and that gradient-based fine-tuning of stitched partitions consumes $\geq 60$% of the tokens required to train from scratch. By enabling models to be collaboratively trained yet not extracted, UPMs make it practical to embed programmatic incentive mechanisms in community-driven decentralized training.

Problem

Research questions and friction points this paper is trying to address.

decentralized training

model extraction

unextractable models

collaborative inference

weight materialization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unextractable Protocol Models

decentralized training

weight unmaterializability