Effective LLM-Driven Code Generation with Pythoness

📅 2025-01-03

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

AI-generated code exhibits low reliability, high maintenance overhead, and poor verifiability in embedded systems. To address these challenges, this paper introduces Pythoness—a domain-specific language (DSL) designed for embedded development that enables developers to specify behavioral requirements via natural-language descriptions or formal tests, thereby replacing low-level coding and enabling collaborative programming with large language models (LLMs). Its core contribution is the first proposal of a test-driven LLM programming paradigm: unit and property tests guide prompt engineering, establishing a closed-loop workflow of code generation, execution, and feedback, while runtime verification continuously ensures correctness. Evaluation of a prototype implementation demonstrates that Pythoness significantly improves test pass rates (+32%) and reduces defect density (−41%) compared to specification-only approaches, while also enhancing maintainability and formal verifiability.

Technology Category

Application Category

📝 Abstract

The advent of large language models (LLMs) has paved the way for a new era of programming tools with both significant capabilities and risks, as the generated code lacks guarantees of correctness and reliability. Developers using LLMs currently face the difficult task of optimizing, integrating, and maintaining code generated by AI. We propose an embedded domain-specific language (DSL), Pythoness, to address those challenges. In Pythoness, developers program with LLMs at a higher level of abstraction. Rather than interacting directly with generated code, developers using Pythoness operate at the level of behavioral specifications when writing functions, classes, or an entire program. These specifications can take the form of unit tests and property-based tests, which may be expressed formally or in natural language. Guided by these specifications, Pythoness generates code that both passes the tests and can be continuously checked during execution. We posit that the Pythoness approach lets developers harness the full potential of LLMs for code generation while substantially mitigating their inherent risks. We describe our current prototype implementation of Pythoness and demonstrate that it can successfully leverage a combination of tests and code generation to yield higher quality code than specifications alone.

Problem

Research questions and friction points this paper is trying to address.

AI-generated code

reliability issues

developer efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pythoness

AI-generated code

code quality enhancement

🔎 Similar Papers

EPiC: Cost-effective Search-based Prompt Engineering of LLMs for Code Generation