Improving LLM-Powered EDA Assistants with RAFT

πŸ“… 2025-06-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In electronic design automation (EDA), the scarcity of high-quality annotated data severely limits the accuracy and domain expertise of open-source large language models (LLMs) in retrieval-augmented generation (RAG). To address this, we propose RAFTβ€”an EDA-oriented, synthetic-data-driven retrieval-augmented fine-tuning framework. RAFT introduces retrieval-augmented few-shot (RAFS) synthesis, the first method to generate high-fidelity question-answer pairs grounded in real user queries. It further integrates fine-grained access control with model memory analysis to enforce strict permission isolation for sensitive design data and mitigate privacy leakage risks. Experiments demonstrate significant improvements in LLM accuracy on EDA tasks such as design verification. Crucially, synthetic data proves effective in substituting scarce human annotations, establishing a reusable technical pathway for adapting LLMs to vertical domains.

Technology Category

Application Category

πŸ“ Abstract
Electronic design engineers often struggle to efficiently access relevant information for tasks like design verification and technology development. While large language models (LLMs) can enhance productivity as conversational agents, pre-trained open-source LLMs lack domain-specific knowledge for Electronic Design Automation (EDA). In a Retrieval-Augmented Generation (RAG) context, LLMs rely on external context but may still produce inaccurate responses. Retrieval-Augmented Fine-Tuning (RAFT) improves LLM performance, but acquiring labeled question/answer (Q/A) data in EDA is difficult. To address this, we propose using synthetic Q/A datasets to enhance LLMs with RAFT. Our results show that RAFT with synthetic data significantly boosts LLM performance for RAG-based EDA tasks. We also investigate the impact of using real user questions as Retrieval-Augmented Few-Shot (RAFS) examples for synthetic data generation. Additionally, we implement secure access control to ensure sensitive information is only accessible to authorized personnel. Finally, we assess the risk of data leakage and unintended memorization during fine-tuning with synthetic data, providing practical insights.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs for EDA tasks with domain-specific knowledge
Addressing inaccurate responses in RAG-based EDA assistants
Mitigating data leakage risks during synthetic data fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses RAFT to enhance LLMs for EDA
Generates synthetic Q/A data for training
Implements secure access control for data
πŸ”Ž Similar Papers
Luyao Shi
Luyao Shi
Staff Research Scientist, IBM Research
Information RetrievalMachine LearningMedical ImagingAI for EDAAI for Healthcare
M
Michael A. Kazda
IBM Infrastructure, Poughkeepsie, NY
C
Charles Schmitter
IBM Infrastructure, Poughkeepsie, NY
H
Hemlata Gupta
IBM Infrastructure, Poughkeepsie, NY