Three Birds, One Stone: Solving the Communication-Memory-Privacy Trilemma in LLM Fine-tuning Over Wireless Networks with Zeroth-Order Optimization

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the challenges of high communication overhead, substantial memory consumption, and privacy risks from gradient leakage in federated fine-tuning of large language models over wireless networks. To overcome these issues, the authors propose pAirZero, a novel framework that synergistically integrates zeroth-order optimization with over-the-air computation, enabling resource-constrained devices to participate in training with inference-level memory usage and bit-level communication costs. By employing adaptive power allocation and calibrated noise injection, pAirZero eliminates the stringent synchronization requirements of over-the-air computation while providing channel-agnostic differential privacy guarantees. Experimental results on the OPT-125M model demonstrate that pAirZero reduces peak memory usage to 25% of conventional approaches and lowers communication overhead by several orders of magnitude, achieving both high efficiency and strong privacy protection.

Technology Category

Application Category

📝 Abstract

Federated Learning (FL) offers a promising pathway for collaboratively fine-tuning Large Language Models (LLMs) at the edge; however, this paradigm faces a critical bottleneck: the prohibitive communication and memory overheads incurred by exchanging high-dimensional gradients. Furthermore, recent studies reveal that user training data can still be recovered from these local gradients, undermining the core privacy promise of FL. In this paper, we address this trilemma of communication, memory, and privacy by proposing pAirZero, a novel framework that synergizes Zeroth-Order (ZO) optimization with Over-the-Air (OTA) computation. Uniquely, pAirZero enables resource-constrained devices to submit their local gradient with only bit-level communication loads while participating in federated fine-tuning of LLMs with inference-level memory costs. This approach not only eliminates the high memory requirements needed for LLM fine-tuning but also alleviates the strict synchronization requirements that plague conventional OTA methods. We further formulate a rigorous optimization model to adaptively determine the optimal transmit power and noise levels, ensuring consistent privacy protection regardless of channel conditions. Numerical experiments demonstrate the superiority of pAirZero in enabling secure, efficient LLM fine-tuning over wireless networks, with only 25% peak memory cost on OPT-125M and communication load orders of magnitude lower than conventional methods.

Problem

Research questions and friction points this paper is trying to address.

communication overhead

memory cost

privacy leakage

federated learning

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zeroth-Order Optimization

Over-the-Air Computation

Federated Learning