Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries

📅 2025-11-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether reinforcement learning (RL) post-training can extend the capabilities of vision-language models (VLMs) in visually grounded spatial reasoning tasks. Addressing the limitation of existing benchmarks—which emphasize language-centric evaluation and overlook spatial reasoning improvements—we propose RLVR, a controllable RL framework based on synthetic mazes, integrated with a difficulty-aware curriculum learning strategy to systematically train and evaluate multi-step spatial reasoning in VLMs. We present the first empirical evidence that RL post-training enables VLMs to achieve over 50% accuracy on zero-shot spatial reasoning tasks and enables cross-domain zero-shot transfer, yielding performance gains of 16–24%. This study not only demonstrates that RL meaningfully enhances VLMs’ visual-spatial capabilities beyond linguistic alignment, but also establishes controllable synthetic environments as a novel paradigm for both evaluating and augmenting spatial reasoning in VLMs.

Technology Category

Application Category

📝 Abstract
While Vision-Language Models (VLMs) post-trained with Reinforcement Learning (RL) show impressive general reasoning, their evaluation is often confined to language-dominant tasks (e.g., math). This raises a critical question: can RL post-training truly extend the inherent capability boundary of a base VLM, particularly for visual-centric spatial tasks where it initially fails? To investigate this, we introduce Ariadne, a framework utilizing synthetic mazes for multi-step spatial reasoning where task difficulty (e.g., path length, turns) is precisely controlled. We leverage this controllable environment to train VLMs using Reinforcement Learning with Verified Rewards (RLVR) in a difficulty-aware curriculum. Surprisingly, post-RLVR training, the VLM achieves over 50% accuracy on a problem set where the base model scored 0%, demonstrating that our approach expands the model's initial capability boundary. To assess real-world viability, we evaluate out-of-distribution (OOD) generalization on practical benchmarks. Despite training only on synthetic maze samples, Ariadne achieves significant zero-shot improvements, averaging 16% on MapBench (e.g., museum navigation) and 24% on ReasonMap (subway transfer tasks). These results confirm that our method not only broadens the model's fundamental limits but also enhances its generalization to real-world spatial reasoning. We acknowledge our study is limited to the post-training phase, given the opaqueness of pre-training data, and hope our research motivates further work on specialized, capability-extending alignment.
Problem

Research questions and friction points this paper is trying to address.

Extending VLM reasoning boundaries for visual-centric spatial tasks
Developing controllable framework for multi-step spatial reasoning evaluation
Enhancing generalization to real-world spatial reasoning benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses synthetic mazes for controlled spatial reasoning
Trains VLMs with difficulty-aware curriculum RLVR
Enables zero-shot generalization to real-world tasks
🔎 Similar Papers
No similar papers found.