NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models (VLMs) exhibit limited compositional reasoning—i.e., decomposing and recombining concepts to solve novel tasks—while neural-symbolic approaches remain constrained by rigid logical execution and predefined predicates, hindering flexibility. To address this, we propose a differentiable hybrid execution framework that dynamically compiles natural language queries into executable Python programs. It decouples perception—handled by a foundational VLM—from symbolic reasoning, which is implemented via soft logical operators and native Python control flow. The framework supports zero-shot inference and subsequent fine-tuning, eliminating reliance on fixed predicate sets or deterministic logic. By unifying neural-symbolic computation with program synthesis, it enables adaptive, open-ended reasoning. Experiments demonstrate substantial improvements over strong baselines across multiple visual reasoning benchmarks and cross-domain adversarial evaluations, showcasing superior compositional generalization and robust adaptation to open-world environments.

Technology Category

Application Category

📝 Abstract
Modern Vision-Language Models (VLMs) have achieved impressive performance in various tasks, yet they often struggle with compositional reasoning, the ability to decompose and recombine concepts to solve novel problems. While neuro-symbolic approaches offer a promising direction, they are typically constrained by crisp logical execution or predefined predicates, which limit flexibility. In this work, we introduce NePTune, a neuro-symbolic framework that overcomes these limitations through a hybrid execution model that integrates the perception capabilities of foundation vision models with the compositional expressiveness of symbolic reasoning. NePTune dynamically translates natural language queries into executable Python programs that blend imperative control flow with soft logic operators capable of reasoning over VLM-generated uncertainty. Operating in a training-free manner, NePTune, with a modular design, decouples perception from reasoning, yet its differentiable operations support fine-tuning. We evaluate NePTune on multiple visual reasoning benchmarks and various domains, utilizing adversarial tests, and demonstrate a significant improvement over strong base models, as well as its effective compositional generalization and adaptation capabilities in novel environments.
Problem

Research questions and friction points this paper is trying to address.

Enhancing compositional reasoning in vision-language models through neuro-symbolic integration
Overcoming limitations of rigid logical execution with flexible Python program translation
Improving generalization in visual reasoning tasks via training-free modular framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates vision models with symbolic reasoning
Translates queries into executable Python programs
Uses soft logic operators for uncertainty reasoning
🔎 Similar Papers
No similar papers found.