🤖 AI Summary
This work addresses zero-shot source attribution of code generated by large language models (LLMs). We formulate attribution as a statistical distribution testing problem and propose the first nonparametric hypothesis testing framework—requiring no training, model fine-tuning, or access to internal parameters—and relying solely on generated code samples and their estimated probability densities. By circumventing the intractability of direct sample comparison in high-dimensional discrete spaces, our method enables genuine zero-shot model identification. Evaluated across prominent code LMs—including DeepSeek-Coder, CodeGemma, and Stable-Code—it achieves AUROC ≥ 0.9 using only ~2,000 samples, substantially outperforming existing black-box attribution approaches. Our core contribution is the establishment of the first distribution-testing-based paradigm for LLM code attribution, combining theoretical rigor with practical efficiency.
📝 Abstract
A growing fraction of all code is sampled from Large Language Models (LLMs). We investigate the problem of attributing code generated by language models using hypothesis testing to leverage established techniques and guarantees. Given a set of samples $S$ and a suspect model $mathcal{L}^*$, our goal is to assess the likelihood of $S$ originating from $mathcal{L}^*$. Due to the curse of dimensionality, this is intractable when only samples from the LLM are given: to circumvent this, we use both samples and density estimates from the LLM, a form of access commonly available.
We introduce $mathsf{Anubis}$, a zero-shot attribution tool that frames attribution as a distribution testing problem. Our experiments on a benchmark of code samples show that $mathsf{Anubis}$ achieves high AUROC scores ( $ge0.9$) when distinguishing between LLMs like DeepSeek-Coder, CodeGemma, and Stable-Code using only $approx 2000$ samples.