A Task Decomposition and Planning Framework for Efficient LLM Inference in AI-Enabled WiFi-Offload Networks

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenges of large language model (LLM) inference in AI-enabled Wi-Fi offloading networks, where heterogeneous computational resources, channel contention, uncertain task complexity, and semantic dependencies hinder efficient execution. The authors propose a user-edge collaborative framework that uniquely integrates LLM-driven task decomposition with inference offloading. A lightweight distilled planner predicts subtask difficulty and output length to accurately estimate execution quality and latency across heterogeneous nodes. Building on these predictions, a decomposition-aware scheduling strategy jointly optimizes subtask assignment, execution, and result aggregation. Experimental results demonstrate that the proposed approach reduces average latency by 20% and improves overall reward by 80% compared to local execution and nearest-edge baselines. Moreover, the lightweight planner achieves performance close to that of the original large model, significantly enhancing the feasibility of edge deployment.

Technology Category

Application Category

📝 Abstract

AI WiFi offload is emerging as a promising approach for providing large language model (LLM) services to resource-constrained wireless devices. However, unlike conventional edge computing, LLM inference over WiFi must jointly address heterogeneous model capabilities, wireless contention, uncertain task complexity, and semantic correlation among reasoning tasks. In this paper, we investigate LLM inference offloading in a multi-user multi-edge WiFi network, where each task can be executed locally, directly offloaded to a nearby edge access point (AP), or decomposed into multiple subtasks for collaborative execution across local and edge nodes. To this end, we propose a user-edge collaborative framework with an LLM-based planner that not only performs task decomposition but also infers subtask difficulty and expected output token length, enabling more accurate estimation of execution quality and latency on heterogeneous nodes. Based on these estimates, we further design a decomposition-aware scheduling strategy that jointly optimizes subtask assignment, execution, and aggregation under communication, queuing, and computation constraints. Simulation results show that the proposed framework achieves a better latency-accuracy tradeoff than local-only and nearest-edge baselines, reducing the average latency by $20\%$ and improving the overall reward by $80\%$. Moreover, the distilled lightweight planner approaches the performance of the large teacher model while remaining more suitable for practical edge deployment.

Problem

Research questions and friction points this paper is trying to address.

LLM inference

WiFi offloading

task decomposition

heterogeneous edge computing

semantic correlation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Task Decomposition

LLM-based Planner

Edge-Cloud Collaboration