🤖 AI Summary
Addressing the absence of foundational models, scarcity of high-quality domain corpora, and inherent complexity of knowledge structures in analog circuit design, this paper introduces the first open-source foundation language model specifically tailored for analog circuits. Methodologically, we establish a domain-specific knowledge framework to guide corpus construction; design a multi-agent system to automatically extract fine-grained question-answer pairs from textbooks, enabling explicit and structured knowledge representation; and propose a fine-grained knowledge distillation technique coupled with neighborhood self-constrained supervised fine-tuning to enhance generalization and output stability under few-shot settings. Built upon the Qwen2.5-32B-Instruct architecture and trained on curated domain corpora, our model achieves 85.04% accuracy on the AMSBench-TQA benchmark—surpassing prior methods by +15.67%—matching commercial models in performance and demonstrating practical utility in operational amplifier design tasks. The model is publicly released.
📝 Abstract
In this paper, we propose AnalogSeeker, an effort toward an open-source foundation language model for analog circuit design, with the aim of integrating domain knowledge and giving design assistance. To overcome the scarcity of data in this field, we employ a corpus collection strategy based on the domain knowledge framework of analog circuits. High-quality, accessible textbooks across relevant subfields are systematically curated and cleaned into a textual domain corpus. To address the complexity of knowledge of analog circuits, we introduce a granular domain knowledge distillation method. Raw, unlabeled domain corpus is decomposed into typical, granular learning nodes, where a multi-agent framework distills implicit knowledge embedded in unstructured text into question-answer data pairs with detailed reasoning processes, yielding a fine-grained, learnable dataset for fine-tuning. To address the unexplored challenges in training analog circuit foundation models, we explore and share our training methods through both theoretical analysis and experimental validation. We finally establish a fine-tuning-centric training paradigm, customizing and implementing a neighborhood self-constrained supervised fine-tuning algorithm. This approach enhances training outcomes by constraining the perturbation magnitude between the model's output distributions before and after training. In practice, we train the Qwen2.5-32B-Instruct model to obtain AnalogSeeker, which achieves 85.04% accuracy on AMSBench-TQA, the analog circuit knowledge evaluation benchmark, with a 15.67% point improvement over the original model and is competitive with mainstream commercial models. Furthermore, AnalogSeeker also shows effectiveness in the downstream operational amplifier design task. AnalogSeeker is open-sourced at https://huggingface.co/analogllm/analogseeker for research use.