DynamicGTR: Leveraging Graph Topology Representation Preferences to Boost VLM Capabilities on Graph QAs

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Existing vision-language models struggle to balance answer accuracy and conciseness in graph-based question answering due to their reliance on fixed graph topology representations. This work proposes DynamicGTR, a framework that dynamically selects the optimal graph representation—such as visual images or textual descriptions—during inference based on query-specific features, without requiring additional training or model reconfiguration. Introducing, for the first time, a dynamic representation scheduling mechanism, DynamicGTR enables customizable trade-offs between accuracy and conciseness while demonstrating strong transferability across tasks, domains, and vision-language models. Experiments show that DynamicGTR significantly improves zero-shot graph question answering performance and successfully transfers knowledge from synthetic tasks to real-world applications such as link prediction and node classification, outperforming methods using static representations.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) have emerged as versatile solutions for zero-shot question answering (QA) across various domains. However, enabling VLMs to effectively comprehend structured graphs and perform accurate, efficient QA remains challenging. Existing approaches typically rely on one single graph topology representation (GTR), such as fixed-style visual images or unified text descriptions. This ``one-size-fits-all'' strategy often neglects model-specific and task-specific preferences, resulting in inaccurate or over-lengthy responses to graph-related queries. To address this, we propose the $\mbox{DynamicGTR}$ framework, which dynamically selects the optimal GTR for each query during inference, thereby enhancing the zero-shot graph QA capabilities of VLMs with a customizable accuracy and brevity trade-off. Extensive experiments show that DynamicGTR not only improves VLM-based graph algorithm QA performance but also successfully transfers the experience trained from synthetic graph algorithm tasks to real-world applications like link prediction and node classification, without any additional training. Additionally, DynamicGTR demonstrates strong transferability across tasks, domains, and models, suggesting its potential as a flexible solution for broad graph scenarios.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

Graph Question Answering

Graph Topology Representation

Zero-shot QA

Structured Graphs

Innovation

Methods, ideas, or system contributions that make the work stand out.

DynamicGTR

Graph Topology Representation

Vision-Language Models