RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector

📅 2024-12-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of large-scale Chinese table-based question answering (QA) datasets in the real estate domain, this paper introduces RETQA—the first open-domain Chinese real estate table QA dataset—comprising 4,932 tables and 20,762 QA pairs spanning 16 subdomains, including property information, developer financials, and land auctions. To tackle challenges in long-table understanding, open-domain retrieval, and multi-domain query interpretation, we propose SLUTQA, a novel framework integrating large language model (LLM)-based in-context learning (ICL), structured table understanding, cross-domain semantic retrieval, and colloquial question parsing. Experiments demonstrate that SLUTQA achieves significant improvements in QA accuracy on RETQA. Both the dataset and implementation code are publicly released to advance the development and practical deployment of intelligent real estate QA systems.

Technology Category

Application Category

📝 Abstract
The real estate market relies heavily on structured data, such as property details, market trends, and price fluctuations. However, the lack of specialized Tabular Question Answering datasets in this domain limits the development of automated question-answering systems. To fill this gap, we introduce RETQA, the first large-scale open-domain Chinese Tabular Question Answering dataset for Real Estate. RETQA comprises 4,932 tables and 20,762 question-answer pairs across 16 sub-fields within three major domains: property information, real estate company finance information and land auction information. Compared with existing tabular question answering datasets, RETQA poses greater challenges due to three key factors: long-table structures, open-domain retrieval, and multi-domain queries. To tackle these challenges, we propose the SLUTQA framework, which integrates large language models with spoken language understanding tasks to enhance retrieval and answering accuracy. Extensive experiments demonstrate that SLUTQA significantly improves the performance of large language models on RETQA by in-context learning. RETQA and SLUTQA provide essential resources for advancing tabular question answering research in the real estate domain, addressing critical challenges in open-domain and long-table question-answering. The dataset and code are publicly available at url{https://github.com/jensen-w/RETQA}.
Problem

Research questions and friction points this paper is trying to address.

Real Estate
Chinese Tabular QA
Dataset Limitations
Innovation

Methods, ideas, or system contributions that make the work stand out.

RETQA
SLUTQA
Real Estate Table Question Answering
🔎 Similar Papers
No similar papers found.
Z
Zhensheng Wang
School of Artificial Intelligence, Beijing Normal University, Beijing 100875, PR China
Wenmian Yang
Wenmian Yang
Specially Appointed Associate Professor, Beijing Normal University at Zhuhai
Data MiningMachine LearningNatural Language ProcessingTime series
K
Kun Zhou
School of Artificial Intelligence, Beijing Normal University, Beijing 100875, PR China
Y
Yiquan Zhang
Elmleaf Ltd., Shanghai 200082, PR China
Weijia Jia
Weijia Jia
FIEEE, Chair Professor, Beijing Normal University and UIC
Cyber Intelligent ComputingNetworking