๐ค AI Summary
This study investigates whether large language models (LLMs) prioritize externally provided explicit label definitions or rely predominantly on internal parametric knowledge during reasoning. To address this, we conduct controlled experiments across general-purpose benchmarks (e.g., BoolQ, MultiRC) and domain-specific benchmarks (e.g., MedQA, SciTail), systematically evaluating model adherence to human-annotated versus LLM-generated label definitions. Results reveal that external definition integration is neither robust nor consistent: in general tasks, models heavily default to internal representations, whereas in domain-specific tasks, external definitions improve accuracy (by +2.1โ5.7%) and enhance decision interpretability. We identify, for the first time, a โtask-sensitivityโ phenomenon in definition adoption, propose a novel quantitative metric for measuring definition adherence, and underscore the critical importance of modeling knowledge fusion mechanisms to achieve controllable, reliable reasoning.
๐ Abstract
Do LLMs genuinely incorporate external definitions, or do they primarily rely on their parametric knowledge? To address these questions, we conduct controlled experiments across multiple explanation benchmark datasets (general and domain-specific) and label definition conditions, including expert-curated, LLM-generated, perturbed, and swapped definitions. Our results reveal that while explicit label definitions can enhance accuracy and explainability, their integration into an LLM's task-solving processes is neither guaranteed nor consistent, suggesting reliance on internalized representations in many cases. Models often default to their internal representations, particularly in general tasks, whereas domain-specific tasks benefit more from explicit definitions. These findings underscore the need for a deeper understanding of how LLMs process external knowledge alongside their pre-existing capabilities.