Compression as Adaptation: Implicit Visual Representation with Diffusion Foundation Models

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Existing visual representations struggle to effectively harness the rich prior knowledge embedded in large-scale generative models, limiting both the efficiency and controllability of video compression. This work proposes a unified framework that encodes visual signals as implicit functions, parameterized via LoRA-like low-rank adapters operating on a frozen diffusion foundation model. By compressing multiple video frames into a single compact latent vector, the method enables high-fidelity reconstruction at extremely low bitrates and supports dynamic control over reconstruction quality during inference. Experimental results demonstrate that this paradigm significantly outperforms current approaches in perceptual video compression while offering strong scalability and flexibility.

Technology Category

Application Category

📝 Abstract

Modern visual generative models acquire rich visual knowledge through large-scale training, yet existing visual representations (such as pixels, latents, or tokens) remain external to the model and cannot directly exploit this knowledge for compact storage or reuse. In this work, we introduce a new visual representation framework that encodes a signal as a function, which is parametrized by low-rank adaptations attached to a frozen visual generative model. Such implicit representations of visual signals, \textit{e.g.}, an 81-frame video, can further be hashed into a single compact vector, achieving strong perceptual video compression at extremely low bitrates. Beyond basic compression, the functional nature of this representation enables inference-time scaling and control, allowing additional refinement on the compression performance. More broadly, as the implicit representations directly act as a function of the generation process, this suggests a unified framework bridging visual compression and generation.

Problem

Research questions and friction points this paper is trying to address.

visual representation

perceptual video compression

diffusion models

implicit representation

low-bitrate compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit representation

low-rank adaptation

perceptual video compression