🤖 AI Summary
This work addresses the challenges of high bandwidth consumption, latency, and deployment difficulty on resource-constrained devices in existing video semantic communication methods for video question answering. The authors propose a lightweight semantic communication framework featuring a novel Chrono-Color Stacking mechanism that losslessly maps temporal video information into a single static color image, enabling extreme temporal compression and explicit visual reconstruction without complex spatiotemporal modeling. Integrated with a lightweight DeepJSCC transceiver and a pretrained BLIP vision-language model, the system achieves up to 192× bandwidth compression on the CLEVRER dataset while maintaining competitive question-answering accuracy.
📝 Abstract
Semantic communication (SC) aims to reduce transmission overhead by conveying task-relevant information rather than raw data. However, existing SC approaches for video largely focus on pixel-level reconstruction or rely on complex spatiotemporal pipelines, leading to excessive bandwidth usage and latency that are unsuitable for low-resource deployments. In this paper, we propose ChronoSC, a task-oriented semantic communication framework for Video Question Answering (VideoQA). ChronoSC introduces Chrono-Color Stacking, a lightweight and lossless projection scheme that encodes temporal video dynamics into a single static image, enabling extreme temporal compression before transmission. This compact semantic representation is transmitted using a lightweight Deep Joint Source-Channel Coding (DeepJSCC) transceiver and explicitly reconstructed at the receiver. Unlike latent-space methods, explicit visual reconstruction enables the direct reuse of pre-trained vision-language models; specifically, a pre-trained BLIP model is employed to infer answers from noisy, reconstructed chrono-images. Experiments on the CLEVRER dataset show that ChronoSC achieves up to 192 times bandwidth reduction compared to raw video transmission while maintaining high VideoQA accuracy.