🤖 AI Summary
To address insufficient information fidelity in video compression under constrained computational resources, this paper proposes an information-uniqueness-driven video compression framework. The core innovation lies in modeling visual token uniqueness via conditional entropy minimization to quantify and suppress both inter-frame and intra-frame redundancy. Based on this principle, we design three key components: a frame-group fusion module, a semantic-aware dynamic token allocation mechanism, and a fine-grained spatially adaptive compression module. Departing from conventional fixed-rate-distortion optimization, our method jointly optimizes representation uniqueness and computational cost (FLOPs). Experiments on mainstream benchmarks demonstrate that, under identical FLOPs constraints, our approach consistently outperforms state-of-the-art video compression models—achieving superior PSNR and MS-SSIM scores while also improving performance on downstream tasks. These results validate the effectiveness of information uniqueness modeling for efficient video representation learning.
📝 Abstract
Distinct from attention-based compression methods, this paper presents an information uniqueness driven video compression framework, termed UniComp, which aims to maximize the information fidelity of video representations under constrained computational budgets. Starting from the information-theoretic perspective, we formulate the vision compression as an optimization problem that minimizes conditional entropy (reconstruction error) between retained and full tokens. To achieve this, we introduce the notion of information uniqueness to measure intrinsic redundancy among tokens to link with reconstruction error. Based on uniqueness, we design three modules-Frame Group Fusion, Token Allocation, and Spatial Dynamic Compression-that progressively perform semantic frame grouping, adaptive resource allocation, and fine-grained spatial compression. Extensive experiments demonstrate that UniComp consistently outperforms existing compression methods in preserving essential visual tokens under limited computational budgets, highlighting the pivotal role of information uniqueness in token compression efficacy.