ReCoQA: A Benchmark for Tool-Augmented and Multi-Step Reasoning in Real Estate Question and Answering

πŸ“… 2026-04-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

163K/year
πŸ€– AI Summary
This work addresses the lack of benchmark support for hybrid reasoning workflows that integrate database queries with external API calls, which hinders agents’ complex reasoning capabilities in multi-source, heterogeneous environments. To bridge this gap, the authors introduce ReCoQA, a large-scale benchmark comprising 29,270 multi-hop question-answering instances in the real estate domain, providing the first machine-verifiable supervision signals that include intent labels, SQL queries, and API invocations as intermediate reasoning steps. They further propose HIRE-Agent, a hierarchical framework employing a understand-plan-execute architecture, which coordinates a front-end parser, a planning supervisor, and execution experts to jointly reason over structured and unstructured data. Experimental results demonstrate that HIRE-Agent achieves strong performance on ReCoQA, validating the efficacy and necessity of hierarchical collaboration for tackling complex real-world tasks.

Technology Category

Application Category

πŸ“ Abstract
Developing agents capable of navigating fragmented, multi-source information remains challenging, primarily due to the scarcity of benchmarks reflecting hybrid workflows combining database querying with external APIs. To bridge this gap, we introduce ReCoQA, a large-scale benchmark of 29,270 real-estate instances featuring machine-verifiable supervision for intermediate steps, including structured intent labels, SQL queries, and API calls. Complementarily, we propose HIRE-Agent, a hierarchical framework instantiating an understand-plan-execute architecture as a strong baseline. By orchestrating a Front-end parser, a planning Supervisor, and execution Specialists, HIRE-Agent effectively integrates heterogeneous evidence. Extensive experiments demonstrate that HIRE-Agent constitutes a strong baseline and substantiates the necessity of hierarchical collaboration for complex, real-world reasoning tasks.
Problem

Research questions and friction points this paper is trying to address.

tool-augmented reasoning
multi-step reasoning
real estate QA
benchmark
hybrid workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-augmented reasoning
multi-step reasoning
hierarchical agent architecture
real estate QA benchmark
structured supervision