Do LLMs Provide Links to Code Similar to what they Generate? A Study with Gemini and Bing CoPilot

📅 2025-01-21

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study identifies a systemic risk—termed “attribution debt”—in large language model (LLM) code generation: external source links provided with generated code frequently fail to accurately trace back to semantically or structurally similar original implementations. Method: We conduct the first large-scale, cross-lingual (6 languages) empirical evaluation (N=437) of source attribution in Bing Copilot and Google Gemini, combining automated code similarity analysis (CodeBLEU, AST matching) with rigorous human validation. Contribution/Results: Only 66% of Copilot and 28% of Gemini source links point to genuinely similar code; the majority are weakly related or misleading. This work provides the first quantitative evidence that current LLM code attribution mechanisms are severely inadequate, undermining foundational assumptions for trustworthy code reuse, compliance auditing, and intellectual property provenance. It establishes critical benchmarks and a methodological framework for designing and evaluating traceable AI-powered coding tools.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are currently used for various software development tasks, including generating code snippets to solve specific problems. Unlike reuse from the Web, LLMs are limited in providing provenance information about the generated code, which may have important trustworthiness and legal consequences. While LLM-based assistants may provide external links that are"related"to the generated code, we do not know how relevant such links are. This paper presents the findings of an empirical study assessing the extent to which 243 and 194 code snippets, across six programming languages, generated by Bing CoPilot and Google Gemini, likely originate from the links provided by these two LLM-based assistants. The study leverages automated code similarity assessments with thorough manual analysis. The study's findings indicate that the LLM-based assistants provide a mix of relevant and irrelevant links having a different nature. Specifically, although 66% of the links from Bing CoPilot and 28% from Google Gemini are relevant, LLMs-based assistants still suffer from serious"provenance debt".

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Code Snippet Generation

Source Link Credibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Code Generation

Transparency and Reliability

🔎 Similar Papers

CodeMirage: Hallucinations in Code Generated by Large Language Models