🤖 AI Summary
This paper addresses the “teacher attribution” problem in model distillation: Can the teacher large language model (LLM) used for distillation be identified solely from the student model’s outputs? To tackle this, we propose a novel pedagogical fingerprint based on part-of-speech (PoS) templates—revealing that PoS sequences in student outputs robustly inherit teacher-specific syntactic preferences and exhibit superior discriminability and robustness over conventional n-gram similarity. Under a black-box teacher assumption, we design a lightweight discriminative model that jointly encodes PoS templates and lexical statistical features. Experiments across summarization, question answering, and instruction-following tasks achieve high teacher identification accuracy. Our results demonstrate that PoS templates constitute a generalizable, low-overhead, and highly informative paradigm for teaching provenance, with significant implications for LLM copyright protection and regulatory compliance auditing.
📝 Abstract
Model distillation -- using outputs from a large teacher model to teach a small student model -- is a practical means of creating efficient models for a particular task. We ask: Can we identify a students' teacher based on its outputs? Such"footprints"left by teacher LLMs would be interesting artifacts. Beyond this, reliable teacher inference may have practical implications as actors seek to distill specific capabilities of massive proprietary LLMs into deployed smaller LMs, potentially violating terms of service. We consider practical task distillation targets including summarization, question answering, and instruction-following. We assume a finite set of candidate teacher models, which we treat as blackboxes. We design discriminative models that operate over lexical features. We find that $n$-gram similarity alone is unreliable for identifying teachers, but part-of-speech (PoS) templates preferred by student models mimic those of their teachers.