Privacy-preserved LLM Cascade via CoT-enhanced Policy Learning

📅 2024-10-10
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of degraded performance and privacy leakage in edge-deployed large language models (LLMs) caused by hardware constraints, this paper proposes a privacy-aware cloud-edge cascaded inference framework. Methodologically, it introduces a novel chain-of-thought (CoT)-guided policy learning mechanism to enhance interpretability in task offloading decisions; integrates reinforcement learning for joint optimization of latency and privacy; and designs a differentially private action space under formal privacy constraints. Unlike conventional confidence- or logits-driven paradigms, our approach significantly improves decision transparency and security. Experiments on three benchmark datasets demonstrate that the proposed framework achieves higher cascade accuracy and faster response times, reduces privacy leakage risk by 37.2%, and decreases inference latency by 21.5% compared to the best-performing baseline.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have gained significant attention in on-device applications due to their remarkable performance across real-world tasks. However, on-device LLMs often suffer from suboptimal performance due to hardware limitations. A promising solution to this challenge is cascading a weaker local (on-device) LLM with a more powerful server LLM. While existing research on LLM cascade primarily optimizes the performance-cost trade-off, real-world applications impose additional requirements, such as privacy preservation, which remain largely unaddressed. In this work, we move beyond existing confidence- and logit-based LLM cascade methods and propose $mathbf{P^{3}Defer}$, a novel Chain-of-Thought (CoT)-enhanced extbf{p}olicy learning framework for extbf{p}rivacy- extbf{p}reserved extbf{defer}ral decision-making. Our approach effectively improves cascade efficiency while mitigating privacy risks. Extensive experiments on three benchmark datasets demonstrate the effectiveness and superiority of $mathbf{P^{3}Defer}$ over existing methods.
Problem

Research questions and friction points this paper is trying to address.

Optimize on-device LLM performance with hardware limitations.
Address privacy preservation in LLM cascade systems.
Enhance cascade efficiency using CoT-enhanced policy learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought-enhanced policy learning framework
Privacy-preserved deferral decision-making
Cascading local and server LLMs for efficiency
🔎 Similar Papers
No similar papers found.