Understanding Specification-Driven Code Generation with LLMs: An Empirical Study Design

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the unclear human-AI collaboration mechanisms in specification-driven software development with large language models (LLMs). We propose CURRANTE, a structured three-stage collaborative paradigm that guides developers through sequential refinement of requirements specifications, test cases, and function implementations. Implemented as a Visual Studio Code extension, CURRANTE integrates LLM assistance, fine-grained interaction logging, and automated test-based evaluation. By collecting interaction data and multidimensional performance metrics—including pass rates and completion time—on medium-difficulty tasks from LiveCodeBench, our work provides the first systematic empirical analysis of how iterative specification and testing dynamically influence LLM-generated code quality. These findings offer evidence-based insights for designing effective AI-augmented programming environments.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly integrated into software development workflows, yet their behavior in structured, specification-driven processes remains poorly understood. This paper presents an empirical study design using CURRANTE, a Visual Studio Code extension that enables a human-in-the-loop workflow for LLM-assisted code generation. The tool guides developers through three sequential stages--Specification, Tests, and Function--allowing them to define requirements, generate and refine test suites, and produce functions that satisfy those tests. Participants will solve medium-difficulty problems from the LiveCodeBench dataset, while the tool records fine-grained interaction logs, effectiveness metrics (e.g., pass rate, all-pass completion), efficiency indicators (e.g., time-to-pass), and iteration behaviors. The study aims to analyze how human intervention in specification and test refinement influences the quality and dynamics of LLM-generated code. The results will provide empirical insights into the design of next-generation development environments that align human reasoning with model-driven code generation.
Problem

Research questions and friction points this paper is trying to address.

specification-driven code generation
Large Language Models
human-in-the-loop
empirical study
code generation quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

specification-driven code generation
human-in-the-loop
LLM-assisted programming
empirical study design
test-guided development
🔎 Similar Papers
No similar papers found.
Giovanni Rosa
Giovanni Rosa
Universidad Rey Juan Carlos
AI for Software EngineeringSoftware QualitySoftware MaintenanceEmpirical Software Engineering
David Moreno-Lumbreras
David Moreno-Lumbreras
Assistant Professor
G
Gregorio Robles
Escuela de Ingenier\'ia de Fuenlabrada, Universidad Rey Juan Carlos, Fuenlabrada, Spain
J
Jes\'us M. Gonz\'alez-Barahona
Escuela de Ingenier\'ia de Fuenlabrada, Universidad Rey Juan Carlos, Fuenlabrada, Spain