🤖 AI Summary
This work addresses the instability, non-determinism, and frequent disregard for development norms exhibited by large language models in unconstrained code generation. To mitigate these issues, the authors propose a structured workflow grounded in Test-Driven Development (TDD), formalizing TDD principles into a machine-readable manifesto for the first time and integrating it within a multi-agent prompting orchestration framework that enforces a clear separation between code generation and validation responsibilities. The approach employs a layered architecture featuring structured prompts, phased execution ordering, bounded repair loops, validation gating, and atomic change mechanisms across planning, generation, repair, and verification stages. Empirical results demonstrate that this methodology substantially enhances the stability, reproducibility, and adherence to established software engineering practices in generated code.
📝 Abstract
Large language models (LLMs) accelerate software development but often exhibit instability, non-determinism, and weak adherence to development discipline in unconstrained workflows. While test-driven development (TDD) provides a structured Red-Green-Refactor process, existing LLM-based approaches typically use tests as auxiliary inputs rather than enforceable process constraints. We present an AI-native TDD framework that operationalizes classical TDD principles as structured prompt-level and workflow-level governance mechanisms. Extracted principles are formalized in a machine-readable manifesto and distributed across planning, generation, repair, and validation stages within a layered architecture that separates model proposal from deterministic engine authority. The system enforces phase ordering, bounded repair loops, validation gates, and atomic mutation control to improve stability and reproducibility. We describe architecture and discuss encoding software engineering discipline directly into prompt orchestration, which we think offers a promising direction for reliable LLM-assisted development.