π€ AI Summary
This study addresses two fundamental challenges in software engineering: (1) the difficulty of accurately decoding and clarifying developer intent, and (2) the lack of trustworthy verification and validation for AI-generated code in highly automated settings. We propose an intent-driven AI agent workflow framework that integrates large language models, static and dynamic program analysis tools, and a composable agent architecture to enable end-to-end autonomous decision-makingβfrom requirements understanding and code generation to test synthesis, program repair, and architectural design. A key innovation is the introduction of an AI-driven hierarchical verification and validation mechanism, ensuring semantic-level intent alignment and code trustworthiness. Empirical evaluation demonstrates significant improvements in task completion accuracy and developer trust. The framework provides both theoretical foundations and practical implementation pathways toward building high-assurance, explainable, and auditable AI software engineers.
π Abstract
AI agents have recently shown significant promise in software engineering. Much public attention has been transfixed on the topic of code generation from Large Language Models (LLMs) via a prompt. However, software engineering is much more than programming, and AI agents go far beyond instructions given by a prompt.
At the code level, common software tasks include code generation, testing, and program repair. Design level software tasks may include architecture exploration, requirements understanding, and requirements enforcement at the code level. Each of these software tasks involves micro-decisions which can be taken autonomously by an AI agent, aided by program analysis tools. This creates the vision of an AI software engineer, where the AI agent can be seen as a member of a development team.
Conceptually, the key to successfully developing trustworthy agentic AI-based software workflows will be to resolve the core difficulty in software engineering - the deciphering and clarification of developer intent. Specification inference, or deciphering the intent, thus lies at the heart of many software tasks, including software maintenance and program repair. A successful deployment of agentic technology into software engineering would involve making conceptual progress in such intent inference via agents.
Trusting the AI agent becomes a key aspect, as software engineering becomes more automated. Higher automation also leads to higher volume of code being automatically generated, and then integrated into code-bases. Thus to deal with this explosion, an emerging direction is AI-based verification and validation (V & V) of AI generated code. We posit that agentic software workflows in future will include such AIbased V&V.