ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing API-based RTL generation approaches rely on golden testbenches, employ closed-source interfaces, and cannot leverage private RTL codebases, thus falling short of industrial requirements. This work proposes ChipMATE—the first self-training multi-agent framework for RTL generation—that enables closed-loop training without golden test cases by orchestrating collaborative verification between Verilog and Python reference model agents. Key innovations include an unsupervised multi-agent co-generation and verification mechanism, backtracking-based reasoning to halt error propagation, and a two-stage fine-tuning strategy incorporating private data alongside a hybrid high-quality data generation pipeline yielding 64.4K samples. On VerilogEval V2, ChipMATE achieves pass@1 accuracies of 75.0% and 80.1% with 4B and 9B parameter models, respectively, outperforming all existing self-training methods and even surpassing the 1600B-parameter DeepSeek V4.

📝 Abstract

Existing API-based agentic systems for RTL code generation are fundamentally misaligned with industrial practice: they assume a golden testbench is available at generation time, rely on closed-source APIs incompatible with chip vendors' air-gapped security requirements, and cannot be trained on vendors' proprietary RTL codebases, leaving valuable internal data unused. Recent self-trained models address the deployment constraint but remain single-turn generators that overlook the critical role of verification in real industrial flows. To bridge these gaps, we present ChipMATE, the first self-trained multi-agent framework for RTL generation. Inspired by industrial practice where correctness emerges from cross-comparison between independently written RTL modules and reference models, ChipMATE pairs a Verilog agent with a Python reference-model agent that mutually verify each other's outputs without any golden oracle. We design a backtrack-based inference workflow to prevent error propagation across turns, and a two-stage training pipeline that first trains each agent individually to saturate its code-generation capability, then trains the team jointly to collaborate effectively. To support the training, we further build a hybrid data-generation framework that produces 64.4K high-quality reference model training samples. ChipMATE achieves 75.0\% and 80.1\% pass@1 on VerilogEval V2 with 4B and 9B base models, outperforming all existing self-trained models and even DeepSeek V4 with 1600B parameters. Our code and model weights are publicly available in https://github.com/zhongkaiyu/ChipMATE.

Problem

Research questions and friction points this paper is trying to address.

RTL generation

industrial practice

verification

air-gapped security

proprietary codebases

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent reinforcement learning

RTL generation

self-training