🤖 AI Summary
This study addresses efficient inference deployment of small language models (SLMs) in edge–cloud continuum environments, balancing low latency, strong privacy, low operational cost, and reliability. We propose a platform-level adaptive inference paradigm grounded in an edge-first design principle and a quantitative trade-off framework—rejecting one-size-fits-all strategies. Our methodology integrates model compression, edge device- and cluster-level benchmarking, and multi-dimensional evaluation across latency, cost, and reliability. Key contributions include: (i) the first systematic characterization of feasibility boundaries for SLMs on resource-constrained edge devices; (ii) a reusable, context-aware deployment decision guide; and (iii) empirical improvements over pure cloud-based inference—achieving 37% average latency reduction and 22% lower operational cost on representative tasks. The framework enables principled, environment-aware SLM deployment across heterogeneous edge–cloud infrastructures.
📝 Abstract
The widespread adoption of Language Models (LMs) across industries is driving interest in deploying these services across the computing continuum, from the cloud to the network edge. This shift aims to reduce costs, lower latency, and improve reliability and privacy. Small Language Models (SLMs), enabled by advances in model compression, are central to this shift, offering a path to on-device inference on resource-constrained edge platforms. This work examines the interplay between edge and cloud deployments, starting from detailed benchmarking of SLM capabilities on single edge devices, and extending to distributed edge clusters. We identify scenarios where edge inference offers comparable performance with lower costs, and others where cloud fallback becomes essential due to limits in scalability or model capacity. Rather than proposing a one-size-fits-all solution, we present platform-level comparisons and design insights for building efficient, adaptive LM inference systems across heterogeneous environments.