Large Language Models for Code Generation: The Practitioners Perspective

📅 2025-01-28

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing LLM code evaluation methodologies rely excessively on synthetic benchmarks and neglect real-world development practices. Method: We propose the first developer-empirical evaluation framework, featuring a unified, automated platform for multi-model integration and execution validation, complemented by a mixed-methods survey across 60 software practitioners from 11 countries. Contribution/Results: Our systematic assessment quantifies LLM-generated code across functionality, syntactic correctness, and engineering practicality. We identify critical capability gaps—including API invocation, contextual modeling, and error recovery—and establish empirically grounded mappings between model capabilities and engineering deployability. The findings provide actionable, evidence-based guidance for LLM selection, prompt engineering optimization, and IDE/toolchain integration in industrial settings.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are developing various tools, benchmarks, and metrics to evaluate the effectiveness of LLM-generated code. However, there is a lack of solutions evaluated through empirically grounded methods that incorporate practitioners perspectives to assess functionality, syntax, and accuracy in real world applications. To address this gap, we propose and develop a multi-model unified platform to generate and execute code based on natural language prompts. We conducted a survey with 60 software practitioners from 11 countries across four continents working in diverse professional roles and domains to evaluate the usability, performance, strengths, and limitations of each model. The results present practitioners feedback and insights into the use of LLMs in software development, including their strengths and weaknesses, key aspects overlooked by benchmarks and metrics, and a broader understanding of their practical applicability. These findings can help researchers and practitioners make informed decisions for systematically selecting and using LLMs in software development projects. Future research will focus on integrating more diverse models into the proposed system, incorporating additional case studies, and conducting developer interviews for deeper empirical insights into LLM-driven software development.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Code Generation

Practical Effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Code Generation

Practical Applications

🔎 Similar Papers

A Survey on Evaluating Large Language Models in Code Generation Tasks