Who Taught You That? Tracing Teachers in Model Distillation

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the “teacher attribution” problem in model distillation: Can the teacher large language model (LLM) used for distillation be identified solely from the student model’s outputs? To tackle this, we propose a novel pedagogical fingerprint based on part-of-speech (PoS) templates—revealing that PoS sequences in student outputs robustly inherit teacher-specific syntactic preferences and exhibit superior discriminability and robustness over conventional n-gram similarity. Under a black-box teacher assumption, we design a lightweight discriminative model that jointly encodes PoS templates and lexical statistical features. Experiments across summarization, question answering, and instruction-following tasks achieve high teacher identification accuracy. Our results demonstrate that PoS templates constitute a generalizable, low-overhead, and highly informative paradigm for teaching provenance, with significant implications for LLM copyright protection and regulatory compliance auditing.

Technology Category

Application Category

📝 Abstract
Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a students' teacher based on its outputs? Such"footprints"left by teacher LLMs would be interesting artifacts. Beyond this, reliable teacher inference may have practical implications as actors seek to distill specific capabilities of massive proprietary LLMs into deployed smaller LMs, potentially violating terms of service. We consider practical task distillation targets including summarization, question answering, and instruction-following. We assume a finite set of candidate teacher models, which we treat as blackboxes. We design discriminative models that operate over lexical features. We find that $n$-gram similarity alone is unreliable for identifying teachers, but part-of-speech (PoS) templates preferred by student models mimic those of their teachers.
Problem

Research questions and friction points this paper is trying to address.

Identify teacher models from student outputs
Detect footprints of teacher LLMs in distillation
Infer teachers violating proprietary model terms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies teacher models via outputs
Uses POS templates for mimicry detection
Designs discriminative models with lexical features
🔎 Similar Papers
No similar papers found.