🤖 AI Summary
To address the escalating transmission and storage burden caused by rapidly growing surveillance video volumes, and to overcome the poor generalizability of existing semantic communication methods—particularly their reliance on large-scale scene-specific training data—this paper proposes a few-shot semantic coding framework tailored for surveillance scenarios. The method employs sketches as lightweight semantic representations and integrates sketch-based compression, reference-frame-guided conditional image translation, and a few-shot generative decoding network, under a joint semantic–pixel optimization training paradigm. We introduce, for the first time, a “sketch-driven semantic compression and cross-modal reconstruction co-design mechanism,” enabling rapid adaptation to new scenes with only a few samples per scenario. Experiments on multiple surveillance datasets demonstrate that our approach achieves 3.2–5.8 dB PSNR gain over baselines, reduces semantic bit-rate by 76%, and maintains controlled SSIM degradation (ΔSSIM < 0.02), significantly outperforming both conventional codecs and state-of-the-art semantic communication methods.
📝 Abstract
With the continuous increase in the number and resolution of video surveillance cameras, the burden of transmitting and storing surveillance video is growing. Traditional communication methods based on Shannon's theory are facing optimization bottlenecks. Semantic communication, as an emerging communication method, is expected to break through this bottleneck and reduce the storage and transmission consumption of video. Existing semantic decoding methods often require many samples to train the neural network for each scene, which is time-consuming and labor-intensive. In this study, a semantic encoding and decoding method for surveillance video is proposed. First, the sketch was extracted as semantic information, and a sketch compression method was proposed to reduce the bit rate of semantic information. Then, an image translation network was proposed to translate the sketch into a video frame with a reference frame. Finally, a few-shot sketch decoding network was proposed to reconstruct video from sketch. Experimental results showed that the proposed method achieved significantly better video reconstruction performance than baseline methods. The sketch compression method could effectively reduce the storage and transmission consumption of semantic information with little compromise on video quality. The proposed method provides a novel semantic encoding and decoding method that only needs a few training samples for each surveillance scene, thus improving the practicality of the semantic communication system.