Self-Supervised Learning as Discrete Communication

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of structured semantic organization in continuous visual representations produced by existing self-supervised learning methods. It reformulates visual self-supervision as a discrete communication process between a teacher and a student network, where multi-label semantic information is transmitted through a fixed-capacity binary channel. The key innovations include using discrete binary codes as semantic carriers, introducing rate-distortion-inspired coding rate regularization to encourage compact and reusable representation structures, and combining periodic reinitialization of the projection head with element-wise binary cross-entropy loss. The proposed method substantially outperforms continuous alignment baselines across diverse tasks—including image classification, retrieval, dense prediction, and domain transfer—demonstrating that the learned binary codes constitute a compact yet semantically rich discrete visual language.

Technology Category

Application Category

📝 Abstract
Most self-supervised learning (SSL) methods learn continuous visual representations by aligning different views of the same input, offering limited control over how information is structured across representation dimensions. In this work, we frame visual self-supervised learning as a discrete communication process between a teacher and a student network, where semantic information is transmitted through a fixed-capacity binary channel. Rather than aligning continuous features, the student predicts multi-label binary messages produced by the teacher. Discrete agreement is enforced through an element-wise binary cross-entropy objective, while a coding-rate regularization term encourages effective utilization of the constrained channel, promoting structured representations. We further show that periodically reinitializing the projection head strengthens this effect by encouraging embeddings that remain predictive across multiple discrete encodings. Extensive experiments demonstrate consistent improvements over continuous agreement baselines on image classification, retrieval, and dense visual prediction tasks, as well as under domain shift through self-supervised adaptation. Beyond backbone representations, we analyze the learned binary codes and show that they form a compact and informative discrete language, capturing semantic factors reusable across classes.
Problem

Research questions and friction points this paper is trying to address.

Self-Supervised Learning
Discrete Representation
Visual Representation
Structured Information
Binary Communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete communication
self-supervised learning
binary codes
coding-rate regularization
structured representations
🔎 Similar Papers
No similar papers found.
K
Kawtar Zaher
INRIA, LIRMM, Université de Montpellier, Montpellier, France; Institut National de l’Audiovisuel, Paris, France
Ilyass Moummad
Ilyass Moummad
Postdoctoral Researcher, Inria IROKO, Montpellier
Deep LearningComputer VisionMachine Listening
O
Olivier Buisson
Institut National de l’Audiovisuel, Paris, France
Alexis Joly
Alexis Joly
Research Director, Inria, Montpellier University, LIRMM
machine learningbiodiversityinformation retrievalplant identification