Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning

📅 2023-05-21

🏛️ Annual Meeting of the Association for Computational Linguistics

📈 Citations: 4

✨ Influential: 0

career value

162K/year

🤖 AI Summary

High-quality training data for logical reasoning tasks is scarce, and web-crawled data often exhibits unreliable logical structure, limiting downstream model performance. Method: We propose AMR-LDA, a logic-driven data augmentation framework that parses input text into Abstract Meaning Representation (AMR) graphs, performs controllable graph editing grounded in logical structure, and reconstructs semantically consistent, logic-faithful augmented texts via inverse generation. Contribution/Results: AMR-LDA introduces the first AMR-graph-driven paradigm for logic-consistent data augmentation, enabling seamless integration with both generative and discriminative large language models—without fine-tuning or architectural modifications. By synergizing AMR reconstruction, prompt enhancement, and contrastive learning, AMR-LDA achieves state-of-the-art performance across seven logical reasoning benchmarks and ranks first on the ReClor leaderboard. The code and datasets are fully open-sourced.

📝 Abstract

Combining large language models with logical reasoning enhances their capacity to address problems in a robust and reliable manner. Nevertheless, the intricate nature of logical reasoning poses challenges when gathering reliable data from the web to build comprehensive training datasets, subsequently affecting performance on downstream tasks. To address this, we introduce a novel logic-driven data augmentation approach, AMR-LDA. AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph, a structured semantic representation that encapsulates the logical structure of the sentence, upon which operations are performed to generate logically modified AMR graphs. The modified AMR graphs are subsequently converted back into text to create augmented data. Notably, our methodology is architecture-agnostic and enhances both generative large language models, such as GPT-3.5 and GPT-4, through prompt augmentation, and discriminative large language models through contrastive learning with logic-driven data augmentation. Empirical evidence underscores the efficacy of our proposed method with improvement in performance across seven downstream tasks, such as reading comprehension requiring logical reasoning, textual entailment, and natural language inference. Furthermore, our method leads on the ReClor leaderboard at https://eval.ai/web/challenges/challenge-page/503/leaderboard/1347. The source code and data are publicly available at https://github.com/Strong-AI-Lab/Logical-Equivalence-driven-AMR-Data-Augmentation-for-Representation-Learning.

Problem

Research questions and friction points this paper is trying to address.

Enhances logical reasoning in large language models

Addresses data scarcity for logical reasoning training

Improves performance on multiple downstream reasoning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Converts text to AMR graphs for augmentation

Modifies AMR graphs to create new data

Enhances LLMs via prompt and contrastive learning

🔎 Similar Papers

Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models