🤖 AI Summary
This work aims to enhance the efficiency with which large language models leverage relational knowledge for analogical reasoning and semantic understanding. By employing causal mediation analysis to identify critical function vectors and fine-tuning them with a minimal set of word-pair examples, the approach significantly improves performance on relational word completion tasks. Furthermore, a composite function vector weighting and fusion mechanism is introduced to effectively support complex analogical reasoning. Integrating activation patching with function vector modulation, the method achieves substantial gains on cognitive science benchmarks and SAT-style analogy tasks. It not only boosts the accuracy of relational word decoding but also aligns closely with human judgments of semantic similarity, while simultaneously improving model interpretability.
📝 Abstract
Representing relations between concepts is a core prerequisite for intelligent systems to make sense of the world. Recent work using causal mediation analysis has shown that a small set of attention heads encodes task representation in in-context learning, captured in a compact representation known as the function vector. We show that fine-tuning function vectors with only a small set of examples (about 20 word pairs) yields better performance on relation-based word-completion tasks than using the original vectors derived from causal mediation analysis. These improvements hold for both small and large language models. Moreover, the fine-tuned function vectors yield improved decoding performance for relation words and show stronger alignment with human similarity judgments of semantic relations. Next, we introduce the composite function vector - a weighted combination of fine-tuned function vectors - to extract relational knowledge and support analogical reasoning. At inference time, inserting this composite vector into LLM activations markedly enhances performance on challenging analogy problems drawn from cognitive science and SAT benchmarks. Our results highlight the potential of activation patching as a controllable mechanism for encoding and manipulating relational knowledge, advancing both the interpretability and reasoning capabilities of large language models.