🤖 AI Summary
To address the challenge of balancing quality and cost when deploying LLM-based agents at the network edge, this paper proposes NetGPT, a cloud-edge collaborative optimization framework. Methodologically, it introduces (1) a network-aware unified fallback threshold mechanism that dynamically adjusts routing decisions based on the monotonic dependence of bandwidth and round-trip time (RTT); and (2) a schema-preserving reinforcement learning method, integrating SFT-anchored composite objective functions with backward KL-divergence–constrained trust-region updates to jointly optimize routing policies and edge-side models. Experimental results demonstrate that NetGPT significantly reduces cloud offloading rates while maintaining high task success rates and structural correctness. Crucially, it enables smooth, controllable trade-offs between quality and cost under dynamic network conditions, outperforming state-of-the-art baselines in both efficiency and reliability.
📝 Abstract
Large language model (LLM) agents at the network edge offer low-latency execution for routine queries. In contrast, complex requests often require the superior capability of cloud models, incurring higher latency and cost. To navigate this quality-cost trade-off under dynamic network conditions, we propose a cloud-edge synergy for NetGPT that integrates network-aware routing with on-edge self-improvement. Specifically, our framework routes structured tool-calling requests to cloud or edge agents via a novel scoring policy. We prove that, under mild regularity assumptions, the optimal routing rule admits a unique fallback threshold with monotone dependence on bandwidth and round-trip time (RTT). Concurrently, based on the dataset collected from requests routed to the cloud and corresponding responses, we instantiate a schema-preserving reinforcement learning (RL) to improve the capability of the edge agent. We analyze a supervised finetuning (SFT)-anchored composite objective that combines a reverse-KL trust-region step with a forward-KL realignment toward the SFT prior, explaining stability and constraining policy drift. Both the network-aware routing policy and the edge agent are updated coherently. Experiments across controlled network states and pricing schedules demonstrate smooth quality-cost frontiers, consistent gains of dynamic fallback thresholds over fixed policies, and sustained reductions in offloading while maintaining task success and schema-correct outputs.