Latency-aware Human-in-the-Loop Reinforcement Learning for Semantic Communications

📅 2026-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inherent trade-off between semantic fidelity and latency constraints in semantic communication for immersive and safety-critical services. To reconcile this conflict, the authors propose the TC-HITL-RL framework, which—within a semantic-aware Open RAN architecture—jointly models human feedback and latency requirements as a constrained Markov decision process (CMDP), integrating semantic utility, human-in-the-loop feedback, and delay control. A dual proximal policy optimization algorithm is introduced, enhanced with action masking and latency-aware reward shaping, to jointly optimize semantic performance and strict delay guarantees. Experimental results demonstrate that, under heterogeneous per-user deadlines in multi-user scenarios, the proposed method consistently satisfies individual latency constraints, achieves higher semantic rewards than baseline schedulers, and effectively stabilizes resource consumption.

Technology Category

Application Category

📝 Abstract
Semantic communication promises task-aligned transmission but must reconcile semantic fidelity with stringent latency guarantees in immersive and safety-critical services. This paper introduces a time-constrained human-in-the-loop reinforcement learning (TC-HITL-RL) framework that embeds human feedback, semantic utility, and latency control within a semantic-aware Open radio access network (RAN) architecture. We formulate semantic adaptation driven by human feedback as a constrained Markov decision process (CMDP) whose state captures semantic quality, human preferences, queue slack, and channel dynamics, and solve it via a primal--dual proximal policy optimization algorithm with action shielding and latency-aware reward shaping. The resulting policy preserves PPO-level semantic rewards while tightening the variability of both air-interface and near-real-time RAN intelligent controller processing budgets. Simulations over point-to-multipoint links with heterogeneous deadlines show that TC-HITL-RL consistently meets per-user timing constraints, outperforms baseline schedulers in reward, and stabilizes resource consumption, providing a practical blueprint for latency-aware semantic adaptation.
Problem

Research questions and friction points this paper is trying to address.

Semantic Communication
Latency Constraint
Human-in-the-Loop
Reinforcement Learning
Time-Critical Services
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latency-aware
Human-in-the-loop
Semantic communication
Constrained MDP
Open RAN
🔎 Similar Papers
No similar papers found.