🤖 AI Summary
To address the challenge of efficiently executing feedforward neural networks (FFNNs) on resource-constrained user equipment (UE) in mobile networks, this paper proposes a generic, retraining-free model partitioning and collaborative inference framework. The framework strategically partitions FFNNs between edge nodes and the core network, enabling joint execution while optimizing computational load distribution and inference latency through a heuristic optimization algorithm and cross-layer resource scheduling in heterogeneous networks. Key contributions include: (i) a plug-and-play, model-agnostic partitioning strategy; (ii) balanced trade-offs between robustness and deployment efficiency; and (iii) no requirement for model restructuring or retraining. Experimental results in representative heterogeneous scenarios demonstrate a 33.6% reduction in UE memory footprint, a 60% decrease in CPU utilization, and tightly controlled inference latency—collectively enhancing the feasibility of on-device AI inference.
📝 Abstract
With mobile networks expected to support services with stringent requirements that ensure high-quality user experience, the ability to apply Feed-Forward Neural Network (FFNN) models to User Equipment (UE) use cases has become critical. Given that UEs have limited resources, running FFNNs directly on UEs is an intrinsically challenging problem. This letter proposes an optimization framework for split computing applications where an FFNN model is partitioned into multiple sections, and executed by UEs, edge- and core-located nodes to reduce the required UE computational footprint while containing the inference time. An efficient heuristic strategy for solving the optimization problem is also provided. The proposed framework is shown to be robust in heterogeneous settings, eliminating the need for retraining and reducing the UE's memory (CPU) footprint by over 33.6% (60%).