Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D spatial reasoning methods suffer from insufficient geometric computation accuracy, while visual programming approaches rely either on fixed toolsets or inefficient inductive tool discovery. Method: This paper proposes a visual programming framework that dynamically constructs a reusable tool library grounded in problem-solving experience. It introduces a novel transductive tool generation paradigm—free of prior assumptions—that leverages vision-language model–driven program synthesis, pattern abstraction, and exemplar-based feedback to realize a closed-loop evolution: “experience accumulation → pattern distillation → tool refinement.” The tool library autonomously and incrementally optimizes during task solving. Contribution/Results: The framework achieves strong generalization to unseen spatial tasks. On Omni3D-Bench, it outperforms GPT-4o by 22% and surpasses the previous state-of-the-art by 11%. Tool invocation frequency is five times higher than inductive methods, and it attains SOTA on SpatialScore-Hard without any task-specific adaptation.

Technology Category

Application Category

📝 Abstract
Spatial reasoning in 3D scenes requires precise geometric calculations that challenge vision-language models. Visual programming addresses this by decomposing problems into steps calling specialized tools, yet existing methods rely on either fixed toolsets or speculative tool induction before solving problems, resulting in suboptimal programs and poor utilization of induced tools. We present Transductive Visual Programming (TVP), a novel framework that builds new tools from its own experience rather than speculation. TVP first solves problems using basic tools while accumulating experiential solutions into an Example Library, then abstracts recurring patterns from these programs into reusable higher-level tools for an evolving Tool Library. This allows TVP to tackle new problems with increasingly powerful tools learned from experience. On Omni3D-Bench, TVP achieves state-of-the-art performance, outperforming GPT-4o by 22% and the previous best visual programming system by 11%. Our transductively learned tools are used 5x more frequently as core program dependency than inductively created ones, demonstrating more effective tool discovery and reuse. The evolved tools also show strong generalization to unseen spatial tasks, achieving superior performance on benchmarks from SpatialScore-Hard collection without any testset-specific modification. Our work establishes experience-driven transductive tool creation as a powerful paradigm for building self-evolving visual programming agents that effectively tackle challenging spatial reasoning tasks. We release our code at https://transductive-visualprogram.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Develops self-evolving visual programming agents for spatial reasoning tasks
Creates reusable tools from experiential solutions rather than speculative induction
Improves tool utilization and generalization for 3D geometric calculations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Builds new tools from experiential solutions
Abstracts recurring patterns into reusable higher-level tools
Evolves tool libraries for increasingly powerful spatial reasoning
🔎 Similar Papers
No similar papers found.