🤖 AI Summary
Rigorous empirical evidence on the productivity impact of LLM-based programming assistants in real-world software engineering remains scarce.
Method: This study is the first to apply causal inference to empirically assess the net effect of the AI programming assistant Cursor on authentic software development. Leveraging panel data from GitHub open-source projects, we innovatively combine a difference-in-differences (DID) design with panel generalized method of moments (GMM) estimation to isolate Cursor’s causal impact.
Contribution/Results: We find that Cursor significantly accelerates initial development speed (+18.3%, p < 0.01), yet concurrently increases static analysis warnings by 22.7% and elevates cyclomatic complexity persistently—ultimately leading to a decline in medium-to-long-term development velocity (−9.4% after six months). This reveals a previously undocumented “efficiency–quality trade-off” paradox in AI-assisted programming, providing critical empirical guidance for tool design and engineering practice.
📝 Abstract
Large language models (LLMs) have demonstrated the promise to revolutionize the field of software engineering. Among other things, LLM agents are rapidly gaining momentum in their application to software development, with practitioners claiming a multifold productivity increase after adoption. Yet, empirical evidence is lacking around these claims. In this paper, we estimate the causal effect of adopting a widely popular LLM agent assistant, namely Cursor, on development velocity and software quality. The estimation is enabled by a state-of-the-art difference-in-differences design comparing Cursor-adopting GitHub projects with a matched control group of similar GitHub projects that do not use Cursor. We find that the adoption of Cursor leads to a significant, large, but transient increase in project-level development velocity, along with a significant and persistent increase in static analysis warnings and code complexity. Further panel generalized method of moments estimation reveals that the increase in static analysis warnings and code complexity acts as a major factor causing long-term velocity slowdown. Our study carries implications for software engineering practitioners, LLM agent assistant designers, and researchers.