🤖 AI Summary
Securely distributing and deploying large language models (LLMs) in open environments—while ensuring weight confidentiality and execution control without relying on costly encryption or private infrastructure—remains a critical challenge. Method: This paper introduces CryptoTensors, a lightweight, Safetensors-based secure model file format. Its core innovation is the first integration of tensor-level fine-grained encryption, embedded policy-driven access control, and automated key management directly within the model file, enabling lazy loading and partial deserialization. Contribution/Results: CryptoTensors achieves transparent compatibility with mainstream inference frameworks—including Hugging Face Transformers and vLLM—without requiring modifications to existing pipelines. It delivers end-to-end secure deployment with near-zero runtime overhead, significantly enhancing both privacy guarantees and engineering practicality for LLMs in untrusted environments.
📝 Abstract
To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize or fine-tune these models. Such privately adapted LLMs are regarded as either personal privacy assets or corporate intellectual property. Therefore, protecting model weights and maintaining strict confidentiality during deployment and distribution have become critically important. However, existing model formats and deployment frameworks provide little to no built-in support for confidentiality, access control, or secure integration with trusted hardware. Current methods for securing model deployment either rely on computationally expensive cryptographic techniques or tightly controlled private infrastructure. Although these approaches can be effective in specific scenarios, they are difficult and costly for widespread deployment.
In this paper, we introduce CryptoTensors, a secure and format-compatible file structure for confidential LLM distribution. Built as an extension to the widely adopted Safetensors format, CryptoTensors incorporates tensor-level encryption and embedded access control policies, while preserving critical features such as lazy loading and partial deserialization. It enables transparent decryption and automated key management, supporting flexible licensing and secure model execution with minimal overhead. We implement a proof-of-concept library, benchmark its performance across serialization and runtime scenarios, and validate its compatibility with existing inference frameworks, including Hugging Face Transformers and vLLM. Our results highlight CryptoTensors as a light-weight, efficient, and developer-friendly solution for safeguarding LLM weights in real-world and widespread deployments.