🤖 AI Summary
Large language models (LLMs) exhibit limited reasoning capability over structured data (e.g., tables, databases), primarily due to the scarcity of structured data in pretraining and rigid text-to-structure mapping mechanisms that fail to capture implicit relational semantics.
Method: We propose CoRE, a novel framework featuring (i) an experience-memory generation mechanism based on Monte Carlo Tree Search (MCTS), enabling training-free, continual cross-modal knowledge transfer; and (ii) contrastive in-context learning integrated with retrieval-augmented generation (RAG) to construct generalizable structured representations.
Contribution/Results: On Text-to-SQL and TableQA benchmarks, CoRE achieves average accuracy gains of +3.44% and +4.24%, respectively, with up to +17.2% improvement on the most challenging samples. The experience memory increases training data diversity by 8–9×, substantially mitigating the modality gap between unstructured text and structured data.
📝 Abstract
Large language models (LLMs) achieve strong performance on plain text tasks but underperform on structured data like tables and databases. Potential challenges arise from their underexposure during pre-training and rigid text-to-structure transfer mechanisms. Unlike humans who seamlessly apply learned patterns across data modalities, LLMs struggle to infer implicit relationships embedded in tabular formats, especially in the absence of explicit structural guidance. To bridge this cognitive gap, we introduce Contrastive Retrieval-Augmented Generation on Experience (CoRE), a framework that builds experience memory representations and enhances generalization through contrastive In-Context Learning (ICL) to simulate human-like knowledge transfer. Experiments on Text-to-SQL and TableQA show CoRE significantly improves performance, achieving average gains of 3.44% and 4.24%, with up to 17.2% on challenging tasks. Our Monte Carlo Tree Search (MCTS)-generated Experience Memory expands training data 8-9x, enhancing diversity and domain coverage. This training-free and continual method propels LLMs toward structured knowledge expertise.