🤖 AI Summary
This work addresses a critical yet previously unexamined issue in tool-augmented large language models (LLMs): performance degradation in STEM tasks due to conflicts between the model’s internal parametric knowledge and externally retrieved tool knowledge. The authors formally introduce and define this phenomenon as “Tool-Memory Conflict” (TMC) and systematically investigate how LLMs prioritize between these two knowledge sources across diverse scenarios. Through comprehensive evaluation of prevailing mitigation strategies—including prompt engineering and retrieval-augmented generation (RAG)—the study demonstrates that current approaches consistently fail to resolve TMC, with particularly pronounced performance deterioration in high-precision STEM contexts. These findings underscore the pervasiveness and severity of TMC, offering crucial insights for the future design of knowledge coordination mechanisms in tool-augmented language models.
📝 Abstract
Tool-augmented large language models (LLMs) have powered many applications. However, they are likely to suffer from knowledge conflict. In this paper, we propose a new type of knowledge conflict -- Tool-Memory Conflict (TMC), where the internal parametric knowledge contradicts with the external tool knowledge for tool-augmented LLMs. We find that existing LLMs, though powerful, suffer from TMC, especially on STEM-related tasks. We also uncover that under different conditions, tool knowledge and parametric knowledge may be prioritized differently. We then evaluate existing conflict resolving techniques, including prompting-based and RAG-based methods. Results show that none of these approaches can effectively resolve tool-memory conflicts.