๐ค AI Summary
High-quality training data for logical reasoning tasks is scarce, and web-crawled data often exhibits unreliable logical structure, limiting downstream model performance. Method: We propose AMR-LDA, a logic-driven data augmentation framework that parses input text into Abstract Meaning Representation (AMR) graphs, performs controllable graph editing grounded in logical structure, and reconstructs semantically consistent, logic-faithful augmented texts via inverse generation. Contribution/Results: AMR-LDA introduces the first AMR-graph-driven paradigm for logic-consistent data augmentation, enabling seamless integration with both generative and discriminative large language modelsโwithout fine-tuning or architectural modifications. By synergizing AMR reconstruction, prompt enhancement, and contrastive learning, AMR-LDA achieves state-of-the-art performance across seven logical reasoning benchmarks and ranks first on the ReClor leaderboard. The code and datasets are fully open-sourced.
๐ Abstract
Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning.