🤖 AI Summary
This study investigates cross-national disparities in generative AI adoption in programming and its impact on developer productivity and innovation. Method: Leveraging 80 million GitHub commits, we develop the first scalable neural classifier to accurately identify AI-generated Python functions—enabling large-scale, code-level detection of AI usage. Integrating individual fixed-effects models, occupational task mapping, and wage data, we isolate the causal effect of AI adoption intensity (distinct from tool accessibility). Contribution/Results: From 2018–2024, AI-generated code accounts for 30.1% of coding activity in the U.S. versus 11.7% in China. A 30% AI usage rate increases quarterly code submissions by 2.4%. We estimate AI’s annual economic value in U.S. software development at $960M–$1.44B, with an upper bound of $6.4B–$9.6B. Moreover, AI adoption significantly accelerates adoption of novel libraries and fosters combinatorial innovation.
📝 Abstract
Generative coding tools promise big productivity gains, but uneven uptake could widen skill and income gaps. We train a neural classifier to spot AI-generated Python functions in 80 million GitHub commits (2018-2024) by 200,000 developers and track how fast--and where--these tools take hold. By December 2024, AI wrote an estimated 30.1% of Python functions from U.S. contributors, versus 24.3% in Germany, 23.2% in France, 21.6% in India, 15.4% in Russia and 11.7% in China. Newer GitHub users use AI more than veterans, while male and female developers adopt at similar rates. Within-developer fixed-effects models show that moving to 30% AI use raises quarterly commits by 2.4%. Coupling this effect with occupational task and wage data puts the annual value of AI-assisted coding in the United States at $9.6-$14.4 billion, rising to $64-$96 billion if we assume higher estimates of productivity effects reported by randomized control trials. Moreover, generative AI prompts learning and innovation, leading to increases in the number of new libraries and library combinations that programmers use. In short, AI usage is already widespread but highly uneven, and the intensity of use, not only access, drives measurable gains in output and exploration.