Can LLMs Interpret and Leverage Structured Linguistic Representations? A Case Study with AMRs

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates large language models’ (LLMs) capacity to understand and leverage structured semantic representations—specifically Abstract Meaning Representation (AMR). We systematically evaluate AMR-enhanced prompting via linearized AMR encoding and zero-shot prompt engineering, using 8-bit quantized and instruction-tuned Llama 3.1 (8B), Phi-3, and Mistral 7B across both short-context (e.g., sentence-level) and long-context (e.g., dialogue summarization) tasks. Our key findings are: (1) AMR prompting yields substantial gains in long-context settings—e.g., zero-shot cosine similarity on SAMSum improves from 66.2% to 76.0% for Llama 3.1—but degrades performance in short-context tasks; and (2) LLMs exhibit high-fidelity AMR-to-text reconstruction capability, achieving up to 81.3% cosine similarity. This is the first study to empirically uncover the interaction between structured semantic representations and context length in LLMs, offering novel empirical evidence on the mechanisms underlying LLM semantic understanding.

Technology Category

Application Category

📝 Abstract
This paper evaluates the ability of Large Language Models (LLMs) to leverage contextual information in the form of structured linguistic representations. Specifically, we examine the impact of encoding both short and long contexts using Abstract Meaning Representation (AMR) structures across a diverse set of language tasks. We perform our analysis using 8-bit quantized and instruction-tuned versions of Llama 3.1 (8B), Phi-3, and Mistral 7B. Our results indicate that, for tasks involving short contexts, augmenting the prompt with the AMR of the original language context often degrades the performance of the underlying LLM. However, for tasks that involve long contexts, such as dialogue summarization in the SAMSum dataset, this enhancement improves LLM performance, for example, by increasing the zero-shot cosine similarity score of Llama 3.1 from 66.2% to 76%. This improvement is more evident in the newer and larger LLMs, but does not extend to the older or smaller ones. In addition, we observe that LLMs can effectively reconstruct the original text from a linearized AMR, achieving a cosine similarity of 81.3% in the best-case scenario.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to use structured linguistic representations (AMRs).
Assessing AMR impact on short vs. long context language tasks.
Testing text reconstruction from linearized AMRs by LLMs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses AMR structures for linguistic representation
Applies 8-bit quantized instruction-tuned LLMs
Enhances long-context tasks with AMR augmentation
🔎 Similar Papers
No similar papers found.