Improving LLM-Powered EDA Assistants with RAFT

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

In electronic design automation (EDA), the scarcity of high-quality annotated data severely limits the accuracy and domain expertise of open-source large language models (LLMs) in retrieval-augmented generation (RAG). To address this, we propose RAFT—an EDA-oriented, synthetic-data-driven retrieval-augmented fine-tuning framework. RAFT introduces retrieval-augmented few-shot (RAFS) synthesis, the first method to generate high-fidelity question-answer pairs grounded in real user queries. It further integrates fine-grained access control with model memory analysis to enforce strict permission isolation for sensitive design data and mitigate privacy leakage risks. Experiments demonstrate significant improvements in LLM accuracy on EDA tasks such as design verification. Crucially, synthetic data proves effective in substituting scarce human annotations, establishing a reusable technical pathway for adapting LLMs to vertical domains.

Technology Category

Application Category

📝 Abstract

Electronic design engineers often struggle to efficiently access relevant information for tasks like design verification and technology development. While large language models (LLMs) can enhance productivity as conversational agents, pre-trained open-source LLMs lack domain-specific knowledge for Electronic Design Automation (EDA). In a Retrieval-Augmented Generation (RAG) context, LLMs rely on external context but may still produce inaccurate responses. Retrieval-Augmented Fine-Tuning (RAFT) improves LLM performance, but acquiring labeled question/answer (Q/A) data in EDA is difficult. To address this, we propose using synthetic Q/A datasets to enhance LLMs with RAFT. Our results show that RAFT with synthetic data significantly boosts LLM performance for RAG-based EDA tasks. We also investigate the impact of using real user questions as Retrieval-Augmented Few-Shot (RAFS) examples for synthetic data generation. Additionally, we implement secure access control to ensure sensitive information is only accessible to authorized personnel. Finally, we assess the risk of data leakage and unintended memorization during fine-tuning with synthetic data, providing practical insights.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs for EDA tasks with domain-specific knowledge

Addressing inaccurate responses in RAG-based EDA assistants

Mitigating data leakage risks during synthetic data fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses RAFT to enhance LLMs for EDA

Generates synthetic Q/A data for training

Implements secure access control for data

🔎 Similar Papers

GenAudit: Fixing Factual Errors in Language Model Outputs with Evidence