Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited reasoning capability over structured data (e.g., tables, databases), primarily due to the scarcity of structured data in pretraining and rigid text-to-structure mapping mechanisms that fail to capture implicit relational semantics. Method: We propose CoRE, a novel framework featuring (i) an experience-memory generation mechanism based on Monte Carlo Tree Search (MCTS), enabling training-free, continual cross-modal knowledge transfer; and (ii) contrastive in-context learning integrated with retrieval-augmented generation (RAG) to construct generalizable structured representations. Contribution/Results: On Text-to-SQL and TableQA benchmarks, CoRE achieves average accuracy gains of +3.44% and +4.24%, respectively, with up to +17.2% improvement on the most challenging samples. The experience memory increases training data diversity by 8–9×, substantially mitigating the modality gap between unstructured text and structured data.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) achieve strong performance on plain text tasks but underperform on structured data like tables and databases. Potential challenges arise from their underexposure during pre-training and rigid text-to-structure transfer mechanisms. Unlike humans who seamlessly apply learned patterns across data modalities, LLMs struggle to infer implicit relationships embedded in tabular formats, especially in the absence of explicit structural guidance. To bridge this cognitive gap, we introduce Contrastive Retrieval-Augmented Generation on Experience (CoRE), a framework that builds experience memory representations and enhances generalization through contrastive In-Context Learning (ICL) to simulate human-like knowledge transfer. Experiments on Text-to-SQL and TableQA show CoRE significantly improves performance, achieving average gains of 3.44% and 4.24%, with up to 17.2% on challenging tasks. Our Monte Carlo Tree Search (MCTS)-generated Experience Memory expands training data 8-9x, enhancing diversity and domain coverage. This training-free and continual method propels LLMs toward structured knowledge expertise.

Problem

Research questions and friction points this paper is trying to address.

LLMs underperform on structured data like tables

LLMs struggle with implicit relationships in tabular formats

Need for human-like knowledge transfer in structured tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive Retrieval-Augmented Generation framework

Monte Carlo Tree Search expands training data

Training-free continual learning method

🔎 Similar Papers

FiDeLiS: Faithful Reasoning in Large Language Model for Knowledge Graph Question Answering