JRDB-Reasoning: A Difficulty-Graded Benchmark for Visual Reasoning in Robotics

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing visual reasoning benchmarks lack formal definitions of reasoning complexity, controllable difficulty mechanisms for question generation, and structured intermediate reasoning annotations—hindering fine-grained evaluation of robots in dense crowd scenarios. Method: We propose JRDB-Reasoning: (1) a formal definition of visual reasoning complexity; (2) an adaptive query engine enabling task-specific, multi-level reasoning question generation with tunable difficulty; and (3) a vision-language evaluation framework built upon an extended JRDB dataset—augmented with human-object interaction and geometric relation annotations—and annotated with step-by-step reasoning chains. Contribution/Results: JRDB-Reasoning is the first benchmark to enable dynamic, interpretable assessment of VLMs and LLMs across varying reasoning depths. It significantly improves evaluation accuracy and controllability in crowd interaction scenarios, supporting rigorous, granular analysis of model reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Recent advances in Vision-Language Models (VLMs) and large language models (LLMs) have greatly enhanced visual reasoning, a key capability for embodied AI agents like robots. However, existing visual reasoning benchmarks often suffer from several limitations: they lack a clear definition of reasoning complexity, offer have no control to generate questions over varying difficulty and task customization, and fail to provide structured, step-by-step reasoning annotations (workflows). To bridge these gaps, we formalize reasoning complexity, introduce an adaptive query engine that generates customizable questions of varying complexity with detailed intermediate annotations, and extend the JRDB dataset with human-object interaction and geometric relationship annotations to create JRDB-Reasoning, a benchmark tailored for visual reasoning in human-crowded environments. Our engine and benchmark enable fine-grained evaluation of visual reasoning frameworks and dynamic assessment of visual-language models across reasoning levels.

Problem

Research questions and friction points this paper is trying to address.

Defining and grading visual reasoning complexity in robotics

Generating customizable questions with varying difficulty levels

Providing structured reasoning annotations for evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formalize reasoning complexity for visual tasks

Adaptive query engine for customizable questions

Extend JRDB with interaction and geometry annotations

🔎 Similar Papers

A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models