TD-Interpreter: Enhancing the Understanding of Timing Diagrams with Visual-Language Learning

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

To address the challenge engineers face in comprehending complex third-party sequence diagrams (TDs), this paper proposes a vision-language question-answering (VQA) system specifically designed for TD understanding. Methodologically, we introduce a controllable TD synthesis pipeline to alleviate annotation scarcity and perform domain-adaptive fine-tuning of the lightweight multimodal large language model LLaVA for joint modeling of TD images and natural language queries. We further employ GPT-4o as a strong baseline for systematic evaluation. Our contributions are threefold: (1) the first dedicated VQA framework for TD understanding; (2) a synthetic-data-driven domain transfer approach enabling effective adaptation to the TD modality; and (3) state-of-the-art performance across multiple TD comprehension benchmarks—outperforming the zero-shot GPT-4o baseline by an average of 23.6%, thereby demonstrating both technical efficacy and engineering practicality.

Technology Category

Application Category

📝 Abstract

We introduce TD-Interpreter, a specialized ML tool that assists engineers in understanding complex timing diagrams (TDs), originating from a third party, during their design and verification process. TD-Interpreter is a visual question-answer environment which allows engineers to input a set of TDs and ask design and verification queries regarding these TDs. We implemented TD-Interpreter with multimodal learning by fine-tuning LLaVA, a lightweight 7B Multimodal Large Language Model (MLLM). To address limited training data availability, we developed a synthetic data generation workflow that aligns visual information with its textual interpretation. Our experimental evaluation demonstrates the usefulness of TD-Interpreter which outperformed untuned GPT-4o by a large margin on the evaluated benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Enhancing understanding of third-party timing diagrams

Providing visual question-answer for design verification

Addressing limited training data with synthetic generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal learning with fine-tuned LLaVA model

Synthetic data generation for visual-text alignment

Visual question-answer environment for timing diagrams

🔎 Similar Papers

Do Vision-Language Models Really Understand Visual Language?