Task-oriented Age of Information for Remote Inference with Hybrid Language Models

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper addresses the fundamental trade-off between accuracy and timeliness in remote inference systems. To jointly capture task-specific utility and information freshness, we propose Task-Aware Age of Information (TAoI), a novel metric that explicitly couples temporal staleness with downstream task objectives. To minimize TAoI, we design an LLM/SLM hybrid inference system and jointly optimize three interdependent components: adaptive image resolution selection, task-aware model routing (LLM vs. SLM), and wireless transmission scheduling. We theoretically establish that the optimal policy exhibits a threshold structure and formulate the problem as a Semi-Markov Decision Process (SMDP). A relative policy iteration algorithm is developed for efficient solution. Simulation results demonstrate that our approach significantly outperforms baseline methods in the accuracy–timeliness trade-off: TAoI is reduced by 37%, while end-to-end latency and error rate are simultaneously optimized.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have revolutionized the field of artificial intelligence (AI) through their advanced reasoning capabilities, but their extensive parameter sets introduce significant inference latency, posing a challenge to ensure the timeliness of inference results. While Small Language Models (SLMs) offer faster inference speeds with fewer parameters, they often compromise accuracy on complex tasks. This study proposes a novel remote inference system comprising a user, a sensor, and an edge server that integrates both model types alongside a decision maker. The system dynamically determines the resolution of images transmitted by the sensor and routes inference tasks to either an SLM or LLM to optimize performance. The key objective is to minimize the Task-oriented Age of Information (TAoI) by jointly considering the accuracy and timeliness of the inference task. Due to the non-uniform transmission time and inference time, we formulate this problem as a Semi-Markov Decision Process (SMDP). By converting the SMDP to an equivalent Markov decision process, we prove that the optimal control policy follows a threshold-based structure. We further develop a relative policy iteration algorithm leveraging this threshold property. Simulation results demonstrate that our proposed optimal policy significantly outperforms baseline approaches in managing the accuracy-timeliness trade-off.

Problem

Research questions and friction points this paper is trying to address.

Minimize Task-oriented Age of Information (TAoI) for remote inference

Balance accuracy and timeliness in hybrid SLM/LLM inference systems

Optimize dynamic resolution and model routing via threshold-based SMDP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid SLM and LLM for dynamic task routing

SMDP optimization for accuracy-timeliness trade-off

Threshold-based policy for optimal inference control

🔎 Similar Papers

No similar papers found.

Authors to Follow