Enhancing multimodal analogical reasoning with Logic Augmented Generation

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Automatic extraction and interpretation of implicit analogical knowledge—particularly cross-modal metaphors—in natural language remains challenging due to poor interpretability and limited grounding in embodied experience. Method: This paper proposes Logic-Augmented Generation (LAG), the first framework integrating structured semantic knowledge graphs with generative reasoning via logic-guided, multi-stage prompting. LAG enables traceable, cross-domain metaphor-level analogy modeling while mitigating large language models’ lack of physical-world experience. It comprises knowledge graph construction, multimodal metaphor detection, and interpretability-aware metaphor understanding evaluation. Results: LAG achieves state-of-the-art performance across three metaphor-related tasks on four benchmarks; its visual metaphor understanding accuracy surpasses human-level performance; and it delivers end-to-end interpretable reasoning paths. A current limitation is reduced generalizability to domain-specific metaphors.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Models have demonstrated their capabilities across a variety of tasks. However, automatically extracting implicit knowledge from natural language remains a significant challenge, as machines lack active experience with the physical world. Given this scenario, semantic knowledge graphs can serve as conceptual spaces that guide the automated text generation reasoning process to achieve more efficient and explainable results. In this paper, we apply a logic-augmented generation (LAG) framework that leverages the explicit representation of a text through a semantic knowledge graph and applies it in combination with prompt heuristics to elicit implicit analogical connections. This method generates extended knowledge graph triples representing implicit meaning, enabling systems to reason on unlabeled multimodal data regardless of the domain. We validate our work through three metaphor detection and understanding tasks across four datasets, as they require deep analogical reasoning capabilities. The results show that this integrated approach surpasses current baselines, performs better than humans in understanding visual metaphors, and enables more explainable reasoning processes, though still has inherent limitations in metaphor understanding, especially for domain-specific metaphors. Furthermore, we propose a thorough error analysis, discussing issues with metaphorical annotations and current evaluation methods.

Problem

Research questions and friction points this paper is trying to address.

Enhancing analogical reasoning in multimodal data using logic-augmented generation

Extracting implicit knowledge from text via semantic knowledge graphs

Improving metaphor detection and understanding through explainable reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Logic-augmented generation for analogical reasoning

Semantic knowledge graphs guide text generation

Prompt heuristics elicit implicit connections

🔎 Similar Papers

Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?