🤖 AI Summary
Existing federated retrieval methods struggle with ambiguous queries in cross-domain settings, leading to degraded retrieval quality and downstream generation performance. To address this, we propose a Dynamic Information Flow-guided Multi-Prototype Alignment Search framework. First, we introduce dynamic information flow analysis into federated retrieval—integrating gradient signals and Shapley values to trace neuron activation paths, enabling fine-grained query intent identification and sub-domain boundary detection. Second, we model cross-source semantic alignment via multi-prototype contrastive learning. Evaluated on five benchmarks, our method achieves up to 14.37% higher knowledge classification accuracy, 5.38% improved retrieval recall, and 6.45% greater downstream question-answering accuracy over state-of-the-art approaches, significantly enhancing federated retrieval effectiveness under cross-domain ambiguous queries.
📝 Abstract
Federated Retrieval (FR) routes queries across multiple external knowledge sources, to mitigate hallucinations of LLMs, when necessary external knowledge is distributed. However, existing methods struggle to retrieve high-quality and relevant documents for ambiguous queries, especially in cross-domain scenarios, which significantly limits their effectiveness in supporting downstream generation tasks. Inspired by dynamic information flow (DIF), we propose DFAMS, a novel framework that leverages DIF to identify latent query intents and construct semantically aligned knowledge partitions for accurate retrieval across heterogeneous sources. Specifically, DFAMS probes the DIF in LLMs by leveraging gradient signals from a few annotated queries and employing Shapley value-based attribution to trace neuron activation paths associated with intent recognition and subdomain boundary detection. Then, DFAMS leverages DIF to train an alignment module via multi-prototype contrastive learning, enabling fine-grained intra-source modeling and inter-source semantic alignment across knowledge bases. Experimental results across five benchmarks show that DFAMS outperforms advanced FR methods by up to 14.37% in knowledge classification accuracy, 5.38% in retrieval recall, and 6.45% in downstream QA accuracy, demonstrating its effectiveness in complex FR scenarios.