đ¤ AI Summary
Existing approaches to flowchart visual question answering often neglect fine-grained semantic relationshipsâsuch as conditional or causal dependenciesâbetween nodes when converting diagrams into intermediate representations, thereby limiting their capacity for complex reasoning. To address this, we propose a semantic relation-aware method that leverages large language models to explicitly extract and model such inter-node semantic relations. Our approach upgrades conventional link-based intermediate languages into semantically enriched representations and introduces a question-intent-driven controllable reasoning mechanism that dynamically selects between shallow and deep reasoning paths. This work is the first to integrate fine-grained semantic relations into flowchart intermediate representations, achieving significant performance gains on the FlowVQA benchmark and consistent improvements across multiple intermediate languagesâincluding Graphviz, Mermaid, and PlantUMLâdemonstrating the effectiveness of semantic relation modeling for complex flowchart question answering.
đ Abstract
Flowchart Question Answering (FlowchartQA) is a multi-modal task that automatically answers questions conditioned on graphic flowcharts. Current studies convert flowcharts into interlanguages (e.g., Graphviz) for Question Answering (QA), which effectively bridge modal gaps between questions and flowcharts. More importantly, they reveal the link relations between nodes in the flowchart, facilitating a shallow relation reasoning during tracing answers. However, the existing interlanguages still lose sight of intricate semantic/logic relationships such as Conditional and Causal relations. This hinders the deep reasoning for complex questions. To address the issue, we propose a novel Semantic Relation-Aware (SRA) FlowchartQA approach. It leverages Large Language Model (LLM) to detect the discourse semantic relations between nodes, by which a link-based interlanguage is upgraded to the semantic relation based interlanguage. In addition, we conduct an interlanguage-controllable reasoning process. In this process, the question intention is analyzed with the aim to determine the depth of reasoning (Shallow or Deep reasoning), as well as the well-matched interlanguage. We experiment on the benchmark dataset FlowVQA. The test results show that SRA yields widespread improvements when upgrading different interlanguages like Graphviz, Mermaid and Plantuml