Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Retrieving parts from complex CAD assemblies based on textual specifications remains challenging due to excessively long metadata sequences, inadequate model performance, and high fine-tuning costs. Method: This paper proposes a zero-shot vision-language retrieval framework integrating multimodal large language models (e.g., GPT-4o, Gemini), retrieval-augmented generation (RAG), and chain-of-thought reasoning. It introduces the novel “Error Notebook” mechanism—a structured, dynamic knowledge base that logs historical reasoning errors and corrections to enable adaptive prompt optimization. Contribution/Results: The framework requires no model fine-tuning and seamlessly adapts commercial closed-source LMMs, significantly enhancing retrieval robustness in domain-specific CAD scenarios. On a human preference benchmark, GPT-4o achieves an absolute accuracy improvement of 23.4%. Ablation studies confirm its particular efficacy for highly complex assemblies with over ten components.

Technology Category

Application Category

📝 Abstract

Effective specification-aware part retrieval within complex CAD assemblies is essential for automated design verification and downstream engineering tasks. However, directly using LLMs/VLMs to this task presents some challenges: the input sequences may exceed model token limits, and even after processing, performance remains unsatisfactory. Moreover, fine-tuning LLMs/VLMs requires significant computational resources, and for many high-performing general-use proprietary models (e.g., GPT or Gemini), fine-tuning access is not available. In this paper, we propose a novel part retrieval framework that requires no extra training, but using Error Notebooks + RAG for refined prompt engineering to help improve the existing general model's retrieval performance. The construction of Error Notebooks consists of two steps: (1) collecting historical erroneous CoTs and their incorrect answers, and (2) connecting these CoTs through reflective corrections until the correct solutions are obtained. As a result, the Error Notebooks serve as a repository of tasks along with their corrected CoTs and final answers. RAG is then employed to retrieve specification-relevant records from the Error Notebooks and incorporate them into the inference process. Another major contribution of our work is a human-in-the-loop CAD dataset, which is used to evaluate our method. In addition, the engineering value of our novel framework lies in its ability to effectively handle 3D models with lengthy, non-natural language metadata. Experiments with proprietary models, including GPT-4o and the Gemini series, show substantial gains, with GPT-4o (Omni) achieving up to a 23.4% absolute accuracy improvement on the human preference dataset. Moreover, ablation studies confirm that CoT reasoning provides benefits especially in challenging cases with higher part counts (>10).

Problem

Research questions and friction points this paper is trying to address.

Improving part retrieval in 3D CAD assemblies without model fine-tuning

Addressing token limit challenges when using VLMs on CAD data

Enhancing retrieval accuracy for complex engineering specifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework using Error Notebooks

RAG-enhanced prompt engineering for VLMs

Handles lengthy non-natural language metadata

🔎 Similar Papers

QueryCAD: Grounded Question Answering for CAD Models