🤖 AI Summary
Large language models (LLMs) often generate syntactically invalid, policy-violating, or non-scalable Infrastructure-as-Code (IaC) configurations in a single-shot manner, especially for cloud-native Terraform deployments. Method: We propose MACOG, a multi-agent collaborative framework that enables reliable, policy-compliant Terraform code generation. MACOG employs a modular agent architecture coordinated via a shared blackboard and finite-state machine, integrating deployment feedback, constrained decoding, retrieval-augmented generation (RAG), and Open Policy Agent (OPA)-based policy validation into a closed-loop optimization pipeline. Contribution/Results: On the IaC-Eval benchmark, MACOG significantly outperforms single-agent baselines (74.02 points with GPT-5; 60.13 points with Gemini-2.5 Pro). Ablation studies confirm the critical roles of multi-agent collaboration, constrained decoding, and policy-driven feedback in achieving robust, compliant IaC generation.
📝 Abstract
The increasing complexity of cloud-native infrastructure has made Infrastructure-as-Code (IaC) essential for reproducible and scalable deployments. While large language models (LLMs) have shown promise in generating IaC snippets from natural language prompts, their monolithic, single-pass generation approach often results in syntactic errors, policy violations, and unscalable designs. In this paper, we propose MACOG (Multi-Agent Code-Orchestrated Generation), a novel multi-agent LLM-based architecture for IaC generation that decomposes the task into modular subtasks handled by specialized agents: Architect, Provider Harmonizer, Engineer, Reviewer, Security Prover, Cost and Capacity Planner, DevOps, and Memory Curator. The agents interact via a shared-blackboard, finite-state orchestrator layer, and collectively produce Terraform configurations that are not only syntactically valid but also policy-compliant and semantically coherent. To ensure infrastructure correctness and governance, we incorporate Terraform Plan for execution validation and Open Policy Agent (OPA) for customizable policy enforcement. We evaluate MACOG using the IaC-Eval benchmark, where MACOG is the top enhancement across models, e.g., GPT-5 improves from 54.90 (RAG) to 74.02 and Gemini-2.5 Pro from 43.56 to 60.13, with concurrent gains on BLEU, CodeBERTScore, and an LLM-judge metric. Ablations show constrained decoding and deploy feedback are critical: removing them drops IaC-Eval to 64.89 and 56.93, respectively.