Effective LLM-Driven Code Generation with Pythoness

πŸ“… 2025-01-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
AI-generated code exhibits low reliability, high maintenance overhead, and poor verifiability in embedded systems. To address these challenges, this paper introduces Pythonessβ€”a domain-specific language (DSL) designed for embedded development that enables developers to specify behavioral requirements via natural-language descriptions or formal tests, thereby replacing low-level coding and enabling collaborative programming with large language models (LLMs). Its core contribution is the first proposal of a test-driven LLM programming paradigm: unit and property tests guide prompt engineering, establishing a closed-loop workflow of code generation, execution, and feedback, while runtime verification continuously ensures correctness. Evaluation of a prototype implementation demonstrates that Pythoness significantly improves test pass rates (+32%) and reduces defect density (βˆ’41%) compared to specification-only approaches, while also enhancing maintainability and formal verifiability.

Technology Category

Application Category

πŸ“ Abstract
The advent of large language models (LLMs) has paved the way for a new era of programming tools with both significant capabilities and risks, as the generated code lacks guarantees of correctness and reliability. Developers using LLMs currently face the difficult task of optimizing, integrating, and maintaining code generated by AI. We propose an embedded domain-specific language (DSL), Pythoness, to address those challenges. In Pythoness, developers program with LLMs at a higher level of abstraction. Rather than interacting directly with generated code, developers using Pythoness operate at the level of behavioral specifications when writing functions, classes, or an entire program. These specifications can take the form of unit tests and property-based tests, which may be expressed formally or in natural language. Guided by these specifications, Pythoness generates code that both passes the tests and can be continuously checked during execution. We posit that the Pythoness approach lets developers harness the full potential of LLMs for code generation while substantially mitigating their inherent risks. We describe our current prototype implementation of Pythoness and demonstrate that it can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.
Problem

Research questions and friction points this paper is trying to address.

AI-generated code
reliability issues
developer efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pythoness
AI-generated code
code quality enhancement
πŸ”Ž Similar Papers
No similar papers found.
K
Kyla H. Levin
University of Massachusetts Amherst
K
Kyle Gwilt
Williams College
Emery D. Berger
Emery D. Berger
Professor of Computer Science, University of Massachusetts Amherst; Amazon Scholar; ACM Fellow
Programming LanguagesSystemsPerformanceSecurityReliability
S
Stephen N. Freund
Williams College