π€ AI Summary
This work addresses the challenge of guaranteeing statistical end-to-end latency and accuracy quality-of-service in multi-cell edge intelligence systems under spatiotemporal uncertainties. To this end, the authors propose a joint wireless and computational resource pre-deployment optimization framework. By integrating Poisson point processes, queueing theory, and empirical AI inference workload measurements, they establish a unified stochastic modeling framework and, for the first time, derive an analytically tractable expression for end-to-end offloading latency. The resulting non-convex joint optimization problem is decomposed into convex subproblems, enabling the attainment of a globally optimal solution. The study further uncovers fundamental trade-offs among base station density, cell size, transmission latency, computational cost, and user fairness, and identifies a cost-efficient design regime in interference-limited scenarios.
π Abstract
Edge intelligence enables AI inference at the network edge, co-located with or near the radio access network, rather than in centralized clouds or on mobile devices. It targets low-latency, resource-constrained applications with large data volumes, requiring tight integration of wireless access and on-site computing. Yet system performance and cost-efficiency hinge on joint pre-deployment dimensioning of radio and computational resources, especially under spatial and temporal uncertainty. Prior work largely emphasizes run-time allocation or relies on simplified models that decouple radio and computing, missing end-to-end correlations in large-scale deployments. This paper introduces a unified stochastic framework to dimension multi-cell edge-intelligent systems. We model network topology with Poisson point processes, capturing random user and base-station locations, inter-cell interference, distance-based fractional power control, and peak-power constraints. By combining this with queueing theory and empirical AI inference workload profiling, we derive tractable expressions for end-to-end offloading delay. These enable a non-convex joint optimization that minimizes deployment cost under statistical QoS guarantees, expressed through strict tail-latency and inference-accuracy constraints. We prove the problem decomposes into convex subproblems, yielding global optimality. Numerical results in noise- and interference-limited regimes identify cost-efficient design regions and configurations that cause under-utilization or user unfairness. Smaller cells reduce transmission delay but raise per-request computing cost due to weaker server multiplexing, whereas larger cells show the opposite trend. Densification reduces computational costs only when frequency reuse scales with base-station density; otherwise, sparser deployments improve fairness and efficiency in interference-limited settings.