Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited reasoning capability over structured data (e.g., tables, databases), primarily due to the scarcity of structured data in pretraining and rigid text-to-structure mapping mechanisms that fail to capture implicit relational semantics. Method: We propose CoRE, a novel framework featuring (i) an experience-memory generation mechanism based on Monte Carlo Tree Search (MCTS), enabling training-free, continual cross-modal knowledge transfer; and (ii) contrastive in-context learning integrated with retrieval-augmented generation (RAG) to construct generalizable structured representations. Contribution/Results: On Text-to-SQL and TableQA benchmarks, CoRE achieves average accuracy gains of +3.44% and +4.24%, respectively, with up to +17.2% improvement on the most challenging samples. The experience memory increases training data diversity by 8–9×, substantially mitigating the modality gap between unstructured text and structured data.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) achieve strong performance on plain text tasks but underperform on structured data like tables and databases. Potential challenges arise from their underexposure during pre-training and rigid text-to-structure transfer mechanisms. Unlike humans who seamlessly apply learned patterns across data modalities, LLMs struggle to infer implicit relationships embedded in tabular formats, especially in the absence of explicit structural guidance. To bridge this cognitive gap, we introduce Contrastive Retrieval-Augmented Generation on Experience (CoRE), a framework that builds experience memory representations and enhances generalization through contrastive In-Context Learning (ICL) to simulate human-like knowledge transfer. Experiments on Text-to-SQL and TableQA show CoRE significantly improves performance, achieving average gains of 3.44% and 4.24%, with up to 17.2% on challenging tasks. Our Monte Carlo Tree Search (MCTS)-generated Experience Memory expands training data 8-9x, enhancing diversity and domain coverage. This training-free and continual method propels LLMs toward structured knowledge expertise.
Problem

Research questions and friction points this paper is trying to address.

LLMs underperform on structured data like tables
LLMs struggle with implicit relationships in tabular formats
Need for human-like knowledge transfer in structured tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Retrieval-Augmented Generation framework
Monte Carlo Tree Search expands training data
Training-free continual learning method
🔎 Similar Papers
No similar papers found.
Jiawei Gu
Jiawei Gu
Sun Yat-sen University
Natural language processingMultimodal reasoning
Z
Ziting Xian
Platform and Content Group, Tencent
Y
Yuanzhen Xie
Platform and Content Group, Tencent
Y
Ye Liu
Platform and Content Group, Tencent
E
Enjie Liu
Platform and Content Group, Tencent
R
Ruichao Zhong
Platform and Content Group, Tencent
M
Mochi Gao
Platform and Content Group, Tencent
Yunzhi Tan
Yunzhi Tan
Tencent
Recommendation SystemMachine Learning
B
Bo Hu
Platform and Content Group, Tencent
Z
Zang Li
Platform and Content Group, Tencent