🤖 AI Summary
In mobile edge general intelligence (MEGI) environments, autonomous LLM inference faces a fundamental tension between high computational overhead and severely constrained edge-device resources. Method: We propose a joint optimization framework integrating adaptive chain-of-thought (CoT) prompting with a distributed mixture-of-experts (MoE) architecture to enable dynamic inference—adjusting reasoning depth and expert activation count in real time based on task complexity and device capability—combined with supervised fine-tuning and lightweight dynamic resource scheduling. Contribution/Results: Experiments demonstrate that our framework significantly improves inference efficiency (2.3× speedup) and deployment scalability on resource-constrained edge devices, while preserving privacy and real-time responsiveness. To the best of our knowledge, this is the first work to achieve practical, high-quality autonomous LLM inference in MEGI scenarios.
📝 Abstract
The rapid advancement of large language models (LLMs) has enabled an emergence of agentic artificial intelligence (AI) with powerful reasoning and autonomous decision-making capabilities. This integration with edge computing has led to the development of Mobile Edge General Intelligence (MEGI), which brings real-time, privacy-preserving reasoning to the network edge. However, deploying LLM-based agentic AI reasoning in MEGI environments poses significant challenges due to the high computational demands of reasoning and the limited resources of edge devices. To address these challenges, we propose a joint optimization framework for efficient LLM reasoning deployment in MEGI. First, we review methods that enhance LLM reasoning capabilities, such as Chain-of-Thought (CoT) prompting, Supervised Fine-Tuning (SFT), and Mixture of Experts (MoE). Next, we present a distributed framework that addresses two correlated aspects: reasoning enhancement through adaptive CoT prompting and scalable deployment through distributed MoE architecture. The framework dynamically activates expert networks and adjusts reasoning depth based on task complexity and device capabilities. We further conduct experimental evaluations in mobile edge environments. Experimental results demonstrate the framework's effectiveness in balancing reasoning quality with resource efficiency, validating the practical viability of deploying sophisticated LLM reasoning capabilities in resource-constrained MEGI environments.