🤖 AI Summary
This work addresses the lack of auditability in cloud-based fine-tuning and inference of large language models, which hinders clients from verifying computational integrity and introduces hidden security risks. To this end, the paper proposes AFTUNE, a framework that enables practical and scalable auditing of large-model training and inference in the cloud for the first time. AFTUNE integrates lightweight execution tracing, verifiable computation traces, and selective sampling-based verification to support efficient, on-demand integrity audits. It achieves this with significantly lower overhead than conventional cryptographic or trusted execution environment (TEE) approaches, thereby demonstrating the feasibility of building trustworthy large-model services within existing cloud infrastructures.
📝 Abstract
Cloud-based infrastructures have become the dominant platform for deploying large models, particularly large language models (LLMs). Fine-tuning and inference are increasingly delegated to cloud providers for simplified deployment and access to proprietary models, yet this creates a fundamental trust gap: although cryptographic and TEE-based verification exist, the scale of modern LLMs renders them prohibitive, leaving clients unable to practically audit these processes. This lack of transparency creates concrete security risks that can silently compromise service integrity. We present AFTUNE, an auditable and verifiable framework that ensures the computation integrity of cloud-based fine-tuning and inference. AFTUNE incorporates a lightweight recording and spot-check mechanism that produces verifiable traces of execution. These traces enable clients to later audit whether the training and inference processes followed the agreed configurations. Our evaluation shows that AFTUNE imposes practical computation overhead while enabling selective and efficient verification, demonstrating that trustworthy model services are achievable in today's cloud environments.