CurveBench: A Benchmark for Exact Topological Reasoning over Nested Jordan Curves

πŸ“… 2026-05-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

227K/year
πŸ€– AI Summary
This study addresses the challenging problem of accurately inferring the hierarchical region containment relationships induced by nested Jordan curves from imagesβ€”a fundamental yet underexplored task in visual topological reasoning. To this end, the work introduces the first structured prediction framework specifically designed for precise topological structure recovery. The authors construct a multi-difficulty visual benchmark comprising 756 images of non-intersecting Jordan curves, accompanied by tree-structured annotations and a tailored evaluation protocol for assessing structural correctness. Experimental results reveal significant limitations of current state-of-the-art vision-language models: the strongest closed-source model achieves only 71.1% and 19.1% accuracy on the easy and hard subsets, respectively. Notably, a fine-tuned Qwen3-VL-8B model attains 33.3% accuracy on the easy subset, surpassing both GPT-4o and Claude Opus. This work establishes a new benchmark and methodological foundation for visual topological reasoning.
πŸ“ Abstract
We introduce CurveBench, a benchmark for hierarchical topological reasoning from visual input. CurveBench consists of \textbf{756 images} of pairwise non-intersecting Jordan curves across easy, polygonal, topographic-inspired, maze-like, and dense counting configurations. Each image is annotated with a rooted tree encoding the containment relations between planar regions. We formulate the task as structured prediction: given an image, a model must recover the full rooted containment tree induced by the curves. Despite the visual simplicity of the task, the strongest evaluated model, Gemini 3.1 Pro, achieves only \textbf{71.1\%} tree-generation accuracy on CurveBench-Easy and \textbf{19.1\%} on CurveBench-Hard. We further demonstrate benchmark utility through RLVR-style fine-tuning of open-weight vision-language models. Our trained Qwen3-VL-8B model improves over \texttt{Qwen-3-VL-8B-Thinking} from \textbf{2.8\%} to \textbf{33.3\%} tree-generation accuracy on CurveBench-Easy, exceeding GPT-5.4 and Claude Opus 4.5 under our evaluation protocol. The remaining gap, especially on CurveBench-Hard, shows that exact topology-aware visual reasoning remains far from solved.
Problem

Research questions and friction points this paper is trying to address.

topological reasoning
Jordan curves
containment hierarchy
structured prediction
visual reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

topological reasoning
Jordan curves
structured prediction
vision-language models
benchmark