π€ AI Summary
To address the degraded semantic communication performance under wireless channel noise and constrained spectrum resources, this paper proposes a CLIP-driven semantic communication framework that eliminates joint training. Methodologically, it leverages a pre-trained CLIP model to directly extract imageβtext aligned semantic representations, enabling a fine-tuning-free semantic encoder and decoder; further, a Proximal Policy Optimization (PPO)-based reinforcement learning mechanism is designed to jointly optimize lightweight model architecture and dynamic spectrum resource allocation. The key innovation lies in the first integration of zero-shot CLIP into semantic communication, achieving end-to-end co-optimization of semantic representation learning, model compression, and resource scheduling. Experimental results demonstrate that, compared to the SAC baseline, the proposed method improves convergence speed by 40% and increases cumulative reward by 4Γ, significantly enhancing system robustness and spectral efficiency.
π Abstract
In this paper, a novel contrastive language-image pre-training (CLIP) model based semantic communication framework is designed. Compared to standard neural network (e.g.,convolutional neural network) based semantic encoders and decoders that require joint training over a common dataset, our CLIP model based method does not require any training procedures thus enabling a transmitter to extract data meanings of the original data without neural network model training, and the receiver to train a neural network for follow-up task implementation without the communications with the transmitter. Next, we investigate the deployment of the CLIP model based semantic framework over a noisy wireless network. Since the semantic information generated by the CLIP model is susceptible to wireless noise and the spectrum used for semantic information transmission is limited, it is necessary to jointly optimize CLIP model architecture and spectrum resource block (RB) allocation to maximize semantic communication performance while considering wireless noise, the delay and energy used for semantic communication. To achieve this goal, we use a proximal policy optimization (PPO) based reinforcement learning (RL) algorithm to learn how wireless noise affect the semantic communication performance thus finding optimal CLIP model and RB for each user. Simulation results show that our proposed method improves the convergence rate by up to 40%, and the accumulated reward by 4x compared to soft actor-critic.