Contrastive Language-Image Pre-Training Model based Semantic Communication Performance Optimization

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the degraded semantic communication performance under wireless channel noise and constrained spectrum resources, this paper proposes a CLIP-driven semantic communication framework that eliminates joint training. Methodologically, it leverages a pre-trained CLIP model to directly extract image–text aligned semantic representations, enabling a fine-tuning-free semantic encoder and decoder; further, a Proximal Policy Optimization (PPO)-based reinforcement learning mechanism is designed to jointly optimize lightweight model architecture and dynamic spectrum resource allocation. The key innovation lies in the first integration of zero-shot CLIP into semantic communication, achieving end-to-end co-optimization of semantic representation learning, model compression, and resource scheduling. Experimental results demonstrate that, compared to the SAC baseline, the proposed method improves convergence speed by 40% and increases cumulative reward by 4×, significantly enhancing system robustness and spectral efficiency.

Technology Category

Application Category

📝 Abstract

In this paper, a novel contrastive language-image pre-training (CLIP) model based semantic communication framework is designed. Compared to standard neural network (e.g.,convolutional neural network) based semantic encoders and decoders that require joint training over a common dataset, our CLIP model based method does not require any training procedures thus enabling a transmitter to extract data meanings of the original data without neural network model training, and the receiver to train a neural network for follow-up task implementation without the communications with the transmitter. Next, we investigate the deployment of the CLIP model based semantic framework over a noisy wireless network. Since the semantic information generated by the CLIP model is susceptible to wireless noise and the spectrum used for semantic information transmission is limited, it is necessary to jointly optimize CLIP model architecture and spectrum resource block (RB) allocation to maximize semantic communication performance while considering wireless noise, the delay and energy used for semantic communication. To achieve this goal, we use a proximal policy optimization (PPO) based reinforcement learning (RL) algorithm to learn how wireless noise affect the semantic communication performance thus finding optimal CLIP model and RB for each user. Simulation results show that our proposed method improves the convergence rate by up to 40%, and the accumulated reward by 4x compared to soft actor-critic.

Problem

Research questions and friction points this paper is trying to address.

Optimizing CLIP-based semantic communication without joint training

Jointly optimizing CLIP model and RB allocation under noise

Enhancing performance via PPO-based RL algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

CLIP model enables training-free semantic extraction

PPO-based RL optimizes CLIP and RB allocation

Wireless noise-resistant semantic communication framework

🔎 Similar Papers

No similar papers found.