Learning Through Noise: Why Subliminal Learning Works and When It Fails

📅 2026-05-22
📈 Citations: 0
✹ Influential: 0
📄 PDF

career value

220K/year
đŸ€– AI Summary
This study investigates how teacher models in subliminal learning transfer task-relevant knowledge to students via task-agnostic input–output pairs, particularly under conditions where the student and teacher share no common initialization. By constructing a multi-head output architecture on MNIST—separating an auxiliary head from the classification head—and combining random initialization, architectural changes (e.g., MLP to CNN), representational similarity analysis, and theoretical derivation, the work demonstrates that effective knowledge transfer hinges on output head compatibility rather than initialization alignment. The research establishes, for the first time, that subliminal learning is driven by compatible output heads, provides a theoretical characterization of its mechanism, and derives an upper-bound condition for failure. Remarkably, even with randomly initialized or architecturally distinct hidden layers, students can recover the teacher’s signal from pure noise if the auxiliary head is compatible; performance approaches or matches that of the teacher when the classification head is also compatible.
📝 Abstract
In the context of artificial neural networks, subliminal learning refers to the transfer of task-relevant knowledge or unintended biases from teacher to student models through distillation on task-unrelated input$\unicode{x2013}$output pairs. Prior explanations tie this effect to shared or closely matched teacher$\unicode{x2013}$student initialization. We show that a closely matched initialization is not necessary. Instead, subliminal learning is governed by compatible output heads. Using a controlled MNIST setting, we split outputs into an auxiliary head (for auxiliary, task-unrelated noise signals) and a class head (for classification) to demonstrate subliminal learning occurs$\unicode{x2014}$even when we randomly initialize hidden layers and remove layers, add new layers, or change the architecture (MLP-to-CNN). Compatible auxiliary heads enable transfer of a recoverable teacher signal, bringing the student's representations closer to the teacher's. When the class heads remain compatible as well, students trained only on task-unrelated noise can approach, and in favorable regimes match, teacher-level task performance. Our setting enables us to develop a theory that explains the mechanism of subliminal learning and to derive upper bounds on when subliminal learning fails. Together, our results turn subliminal learning from a surprising transfer effect into a theoretically grounded mechanism with predictable limits.
Problem

Research questions and friction points this paper is trying to address.

subliminal learning
knowledge distillation
task-unrelated signals
neural network transfer
output head compatibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

subliminal learning
knowledge distillation
compatible output heads
task-agnostic training
theoretical bounds
🔎 Similar Papers
V
Vincent C. Brockers
Max Planck Institute for Dynamics and Self-Organization, Göttingen; Faculty of Physics, Institute for the Dynamics of Complex Systems, University of Göttingen
R
Roman D. Ventzke
Max Planck Institute for Dynamics and Self-Organization, Göttingen; Faculty of Physics, Institute for the Dynamics of Complex Systems, University of Göttingen
Valentin Neuhaus
Valentin Neuhaus
PhD Student in Physics, Max Planck Institute for Dynamics and Self-Organization
Information TheoryMachine LearningPartial Information DecompositionPhysicsNeural Networks
B
Belén Hidalgo-Ogalde
Max Planck Institute for Dynamics and Self-Organization, Göttingen; Faculty of Physics, Institute for the Dynamics of Complex Systems, University of Göttingen
Viola Priesemann
Viola Priesemann
Max Planck Institute for Dynamics and Self-Organization
Neuroscience | Physics | Societal Dynamics