Smaller, Smarter, Closer: The Edge of Collaborative Generative AI

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address high latency, excessive cost, weak privacy guarantees, and capability-resource mismatch in generative AI deployment across cloud-edge environments, this paper proposes a collaborative inference framework for edge-cloud co-inference. We systematically define four collaboration paradigms for the first time, design hierarchical scheduling principles, and introduce a lightweight communication protocol. Our method innovatively integrates model partitioning, dynamic offloading, quantization-aware gradient exchange, and cache-enhanced prompt routing. Evaluated on a hybrid testbed comprising Jetson AGX edge devices and a cloud cluster, the framework achieves a 63% reduction in end-to-end latency, a 71% decrease in communication overhead, and maintains inference quality exceeding 92% of large language model (LLM) baselines. This work delivers a scalable, system-level solution enabling efficient, secure, and high-fidelity deployment of generative AI in resource-constrained edge environments.

Technology Category

Application Category

📝 Abstract

The rapid adoption of generative AI (GenAI), particularly Large Language Models (LLMs), has exposed critical limitations of cloud-centric deployments, including latency, cost, and privacy concerns. Meanwhile, Small Language Models (SLMs) are emerging as viable alternatives for resource-constrained edge environments, though they often lack the capabilities of their larger counterparts. This article explores the potential of collaborative inference systems that leverage both edge and cloud resources to address these challenges. By presenting distinct cooperation strategies alongside practical design principles and experimental insights, we offer actionable guidance for deploying GenAI across the computing continuum.

Problem

Research questions and friction points this paper is trying to address.

Addressing latency, cost, and privacy in cloud-centric GenAI deployments

Bridging capability gaps between SLMs and LLMs for edge environments

Designing collaborative edge-cloud systems for efficient GenAI inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging edge and cloud collaborative inference

Utilizing Small Language Models for edge

Providing practical GenAI deployment guidance

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

No related jobs found.

Authors to Follow