Visual Set Program Synthesizer

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Current vision-language AI assistants struggle with complex visual queries requiring set-level reasoning—such as filtering, comparison, and aggregation—due to the absence of explicit compositional logic mechanisms. This work proposes a program-driven visual reasoning paradigm that models inference as visual program synthesis: a multimodal large language model first generates a symbolic program, which is then executed by a dedicated interpreter over a parsed visual scene representation, enabling structured and interpretable reasoning. To evaluate this approach, we introduce the Set-VQA benchmark, where our method substantially outperforms existing models, achieving significant improvements in accuracy, systematicity, and transparency.

Technology Category

Application Category

📝 Abstract

A user pointing their phone at a supermarket shelf and asking "Which soda has the least sugar?" poses a difficult challenge for current visual Al assistants. Such queries require not only object recognition, but explicit set-based reasoning such as filtering, comparison, and aggregation. Standard endto-end MLLMs often fail at these tasks because they lack an explicit mechanism for compositional logic. We propose treating visual reasoning as Visual Program Synthesis, where the model first generates a symbolic program that is executed by a separate engine grounded in visual scenes. We also introduce Set-VQA, a new benchmark designed specifically for evaluating set-based visual reasoning. Experiments show that our approach significantly outperforms state-of-the-art baselines on complex reasoning tasks, producing more systematic and transparent behavior while substantially improving answer accuracy. These results demonstrate that program-driven reasoning provides a principled alternative to black-box visual-language inference.

Problem

Research questions and friction points this paper is trying to address.

visual reasoning

set-based reasoning

program synthesis

compositional logic

visual question answering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Program Synthesis

Set-based Reasoning

Symbolic Execution

Multimodal Reasoning

Set-VQA

🔎 Similar Papers

No similar papers found.