๐ค AI Summary
This work addresses the substantial redundancy in multi-task computer vision, where conventional codecs struggle to effectively disentangle shared and task-specific information. For the first time, the GrayโWyner information-theoretic framework is integrated into an end-to-end learnable codec architecture, yielding a lossy common information modeling approach tailored for multi-task visual representation. The proposed method employs a three-branch network to explicitly decouple shared and task-exclusive content and introduces an optimization objective grounded in lossy common information theory. Evaluated under dual-task settings across six visual benchmarks, the approach consistently outperforms independent encoding schemes while significantly reducing representational redundancy, thereby demonstrating the practical relevance of classical information theory in modern representation learning.
๐ Abstract
Many computer vision tasks share substantial overlapping information, yet conventional codecs tend to ignore this, leading to redundant and inefficient representations. The Gray-Wyner network, a classical concept from information theory, offers a principled framework for separating common and task-specific information. Inspired by this idea, we develop a learnable three-channel codec that disentangles shared information from task-specific details across multiple vision tasks. We characterize the limits of this approach through the notion of lossy common information, and propose an optimization objective that balances inherent tradeoffs in learning such representations. Through comparisons of three codec architectures on two-task scenarios spanning six vision benchmarks, we demonstrate that our approach substantially reduces redundancy and consistently outperforms independent coding. These results highlight the practical value of revisiting Gray-Wyner theory in modern machine learning contexts, bridging classic information theory with task-driven representation learning.