π€ AI Summary
This work addresses the significant challenges in efficient task offloading within mobile edge computing, where dynamic task arrivals, time-varying channels, and spatiotemporal coupling of server queues complicate decision-making. Existing approaches often lack adaptability, struggle with generalization, and frequently overlook long-term system dynamics. To overcome these limitations, we propose COMLLM, a novel framework that, for the first time, integrates large language models with multi-step reasoning into task offloading. COMLLM leverages multi-step Monte Carlo tree search and a Look-Ahead Collaborative Simulation mechanism to jointly model queue dynamics for proactive decisions, while employing Group Relative Policy Optimization to enhance long-term performance. Without requiring retraining, it achieves zero-shot generalization to unseen large-scale topologies, attaining near-optimal latency and significantly improved load-balancing fairness, thereby outperforming supervised fine-tuning, deep reinforcement learning, and heuristic baselines across the board.
π Abstract
Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies remains difficult due to dynamic task arrivals, time-varying channels, and the spatio-temporal coupling of server queues. Conventional heuristics lack adaptability, while Deep Reinforcement Learning (DRL) suffers from limited generalization and architectural rigidity, requiring retraining when network topology changes. Although Large Language Models (LLMs) offer semantic reasoning capabilities, standard Supervised Fine-Tuning (SFT) yields myopic policies that greedily minimize immediate latency without accounting for long-term system evolution. To address these limitations, we propose COMLLM, a generative framework that enables foresighted decision-making in MEC systems. COMLLM integrates Group Relative Policy Optimization (GRPO) with a Look-Ahead Collaborative Simulation (LACS) mechanism, which performs multi-step Monte Carlo rollouts while jointly modeling server queue dynamics. By incorporating these rollouts into the reward design, the framework captures the long-term impact of current decisions on future system states. Experimental results demonstrate that COMLLM achieves near-optimal latency and improved load-balancing fairness. Notably, it exhibits zero-shot topological scalability, allowing a model trained on small-scale networks to generalize to larger, unseen topologies without retraining, outperforming SFT, DRL, and heuristic baselines.