🤖 AI Summary
This study addresses the challenge of efficiently attributing training data contributions to outputs of instruction-tuned large language models (LLMs). Existing attribution methods suffer from sensitivity to retrieval noise and poor stability. To overcome these limitations, we propose two novel approaches: (1) a similarity metric grounded in in-context learning and prompt engineering, and (2) a matrix decomposition framework that models data contribution estimation as a mixture-distribution inference problem—enhancing both robustness and interpretability. Extensive experiments across multiple benchmark datasets demonstrate that our hybrid model achieves significantly more stable and accurate data contribution estimates than state-of-the-art baselines. The method scales effectively to large models and diverse instruction-tuning settings, offering a practical, scalable pathway toward LLM traceability and data provenance assessment—critical for copyright evaluation and responsible AI deployment.
📝 Abstract
We investigate the use of in-context learning and prompt engineering to estimate the contributions of training data in the outputs of instruction-tuned large language models (LLMs). We propose two novel approaches: (1) a similarity-based approach that measures the difference between LLM outputs with and without provided context, and (2) a mixture distribution model approach that frames the problem of identifying contribution scores as a matrix factorization task. Our empirical comparison demonstrates that the mixture model approach is more robust to retrieval noise in in-context learning, providing a more reliable estimation of data contributions.