Zero-Shot Attribution for Large Language Models: A Distribution Testing Approach

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses zero-shot source attribution of code generated by large language models (LLMs). We formulate attribution as a statistical distribution testing problem and propose the first nonparametric hypothesis testing framework—requiring no training, model fine-tuning, or access to internal parameters—and relying solely on generated code samples and their estimated probability densities. By circumventing the intractability of direct sample comparison in high-dimensional discrete spaces, our method enables genuine zero-shot model identification. Evaluated across prominent code LMs—including DeepSeek-Coder, CodeGemma, and Stable-Code—it achieves AUROC ≥ 0.9 using only ~2,000 samples, substantially outperforming existing black-box attribution approaches. Our core contribution is the establishment of the first distribution-testing-based paradigm for LLM code attribution, combining theoretical rigor with practical efficiency.

Technology Category

Application Category

📝 Abstract

A growing fraction of all code is sampled from Large Language Models (LLMs). We investigate the problem of attributing code generated by language models using hypothesis testing to leverage established techniques and guarantees. Given a set of samples $S$ and a suspect model $mathcal{L}^*$, our goal is to assess the likelihood of $S$ originating from $mathcal{L}^*$. Due to the curse of dimensionality, this is intractable when only samples from the LLM are given: to circumvent this, we use both samples and density estimates from the LLM, a form of access commonly available. We introduce $mathsf{Anubis}$, a zero-shot attribution tool that frames attribution as a distribution testing problem. Our experiments on a benchmark of code samples show that $mathsf{Anubis}$ achieves high AUROC scores ( $ge0.9$) when distinguishing between LLMs like DeepSeek-Coder, CodeGemma, and Stable-Code using only $approx 2000$ samples.

Problem

Research questions and friction points this paper is trying to address.

Attributing code generated by large language models using hypothesis testing

Assessing likelihood of code samples originating from suspect models

Developing zero-shot tool for distribution testing in model attribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses hypothesis testing for code attribution

Leverages samples and density estimates

Achieves high AUROC with minimal samples

🔎 Similar Papers

ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models