🤖 AI Summary
This study investigates the intrinsic mechanisms of large language models (LLMs) in passage reranking, focusing on how their implicit neural representations correspond to hand-crafted features and influence ranking decisions. We propose the first fine-grained, multi-dimensional neuron probing framework tailored for reranking LLMs, integrating intra-layer activation decoding, multi-granularity feature engineering—including query-document interaction, document structure, and LLM-specific representations—and distributional robustness diagnostics. Our analysis reveals that while semantic and interaction-related features are explicitly encoded, several critical features remain notably absent. We further uncover distinct neural response patterns to highly versus marginally relevant passages, as well as to out-of-distribution queries. All code and experimental scripts are publicly released, establishing a systematic, empirically grounded paradigm for interpreting LLM-based ranking behavior.
📝 Abstract
Transformer networks, especially those with performance on par with GPT models, are renowned for their powerful feature extraction capabilities. However, the nature and correlation of these features with human-engineered ones remain unclear. In this study, we delve into the mechanistic workings of state-of-the-art, fine-tuning-based passage-reranking transformer networks. Our approach involves a probing-based, layer-by-layer analysis of neurons within ranking LLMs to identify individual or groups of known human-engineered and semantic features within the network's activations. We explore a wide range of features, including lexical, document structure, query-document interaction, advanced semantic, interaction-based, and LLM-specific features, to gain a deeper understanding of the underlying mechanisms that drive ranking decisions in LLMs. Our results reveal a set of features that are prominently represented in LLM activations, as well as others that are notably absent. Additionally, we observe distinct behaviors of LLMs when processing low versus high relevance queries and when encountering out-of-distribution query and document sets. By examining these features within activations, we aim to enhance the interpretability and performance of LLMs in ranking tasks. Our findings provide valuable insights for the development of more effective and transparent ranking models, with significant implications for the broader information retrieval community. All scripts and code necessary to replicate our findings are made available.