🤖 AI Summary
Visual semantic communication suffers from low efficiency and poor task adaptability over bandwidth-constrained wireless channels. Method: This paper proposes a task-adaptive semantic selection paradigm that dynamically selects optimal semantic representations—such as object labels or compact scene graphs—based on downstream vision tasks (e.g., classification, reconstruction), and introduces a scene graph redundancy filtering mechanism to precisely align semantic granularity with task complexity. The methodology encompasses semantic representation analysis, structured compression, channel simulation, and end-to-end task performance evaluation. Contribution/Results: Experiments demonstrate over 45× improvement in semantic throughput compared to raw image transmission, significantly reducing latency while maintaining high performance across multiple real-time vision tasks. This work establishes the first systematic framework for task-driven semantic selection, introducing a novel paradigm for efficient semantic communication.
📝 Abstract
Recently, semantic communications have drawn great attention as the groundbreaking concept surpasses the limited capacity of Shannon's theory. Specifically, semantic communications probably become crucial in realizing visual tasks that demand massive network traffic. Although highly distinctive forms of visual semantics exist for computer vision tasks, a thorough investigation of what visual semantics can be transmitted in time and which one is required for completing different visual tasks has not yet been reported. To this end, we first scrutinize the achievable throughput in transmitting existing visual semantics through the limited wireless communication bandwidth. In addition, we further demonstrate the resulting performance of various visual tasks for each visual semantic. Based on the empirical testing, we suggest a task-adaptive selection of visual semantics is crucial for real-time semantic communications for visual tasks, where we transmit basic semantics (e.g., objects in the given image) for simple visual tasks, such as classification, and richer semantics (e.g., scene graphs) for complex tasks, such as image regeneration. To further improve transmission efficiency, we suggest a filtering method for scene graphs, which drops redundant information in the scene graph, thus allowing the sending of essential semantics for completing the given task. We confirm the efficacy of our task-adaptive semantic communication approach through extensive simulations in wireless channels, showing more than 45 times larger throughput over a naive transmission of original data. Our work can be reproduced at the following source codes: https://github.com/jhpark2024/jhpark.github.io