Contrastive Language-Image Pre-Training Model based Semantic Communication Performance Optimization

πŸ“… 2025-07-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the degraded semantic communication performance under wireless channel noise and constrained spectrum resources, this paper proposes a CLIP-driven semantic communication framework that eliminates joint training. Methodologically, it leverages a pre-trained CLIP model to directly extract image–text aligned semantic representations, enabling a fine-tuning-free semantic encoder and decoder; further, a Proximal Policy Optimization (PPO)-based reinforcement learning mechanism is designed to jointly optimize lightweight model architecture and dynamic spectrum resource allocation. The key innovation lies in the first integration of zero-shot CLIP into semantic communication, achieving end-to-end co-optimization of semantic representation learning, model compression, and resource scheduling. Experimental results demonstrate that, compared to the SAC baseline, the proposed method improves convergence speed by 40% and increases cumulative reward by 4Γ—, significantly enhancing system robustness and spectral efficiency.

Technology Category

Application Category

πŸ“ Abstract
In this paper, a novel contrastive language-image pre-training (CLIP) model based semantic communication framework is designed. Compared to standard neural network (e.g.,convolutional neural network) based semantic encoders and decoders that require joint training over a common dataset, our CLIP model based method does not require any training procedures thus enabling a transmitter to extract data meanings of the original data without neural network model training, and the receiver to train a neural network for follow-up task implementation without the communications with the transmitter. Next, we investigate the deployment of the CLIP model based semantic framework over a noisy wireless network. Since the semantic information generated by the CLIP model is susceptible to wireless noise and the spectrum used for semantic information transmission is limited, it is necessary to jointly optimize CLIP model architecture and spectrum resource block (RB) allocation to maximize semantic communication performance while considering wireless noise, the delay and energy used for semantic communication. To achieve this goal, we use a proximal policy optimization (PPO) based reinforcement learning (RL) algorithm to learn how wireless noise affect the semantic communication performance thus finding optimal CLIP model and RB for each user. Simulation results show that our proposed method improves the convergence rate by up to 40%, and the accumulated reward by 4x compared to soft actor-critic.
Problem

Research questions and friction points this paper is trying to address.

Optimizing CLIP-based semantic communication without joint training
Jointly optimizing CLIP model and RB allocation under noise
Enhancing performance via PPO-based RL algorithm
Innovation

Methods, ideas, or system contributions that make the work stand out.

CLIP model enables training-free semantic extraction
PPO-based RL optimizes CLIP and RB allocation
Wireless noise-resistant semantic communication framework
πŸ”Ž Similar Papers
No similar papers found.
S
Shaoran Yang
Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, 33146, USA
Dongyu Wei
Dongyu Wei
University of Miami
Wireless communicationOptimizationMachine Learning
H
Hanzhi Yu
Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL, 33146, USA
Z
Zhaohui Yang
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China
Y
Yuchen Liu
Department of Computer Science, NC State University, Raleigh, NC, 27695, USA
Mingzhe Chen
Mingzhe Chen
Assistant Professor, Electrical and Computer Engineering Department, University of Miami
Machine learningdigital network twinsunmanned aerial vehiclessemantic communications.