🤖 AI Summary
Complex tasks requiring integration of multi-source evidence necessitate compositional retrieval—selecting and combining context fragments across multiple steps.
Method: We propose a serialized compositional retrieval paradigm that formalizes retrieval as a Markov Decision Process (MDP), iteratively selecting and composing contextual snippets to construct information-complete prompts. Our approach introduces a novel triple-encoder architecture to explicitly model cross-sample dependencies and designs a structural consistency reward aligned with LLM preferences, optimized end-to-end via supervised pretraining followed by reinforcement fine-tuning.
Contribution/Results: Evaluated on multi-hop reasoning and code synthesis benchmarks, our method significantly outperforms single-step retrieval baselines. It demonstrates both effectiveness in constructing complex, task-relevant contexts and strong generalization across diverse compositional reasoning tasks, validating the paradigm’s utility for evidence aggregation in advanced language understanding and generation.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks. While retrieval-augmented frameworks traditionally focus on selecting top-ranked documents in a single pass, many real-world scenarios demand compositional retrieval, where multiple sources must be combined in a coordinated manner. In this work, we propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP), decomposing the probability of retrieving a set of elements into a sequence of conditional probabilities and allowing each retrieval step to be conditioned on previously selected examples. We train the retriever in two stages: first, we efficiently construct supervised sequential data for initial policy training; we then refine the policy to align with the LLM's preferences using a reward grounded in the structural correspondence of generated programs. Experimental results show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies. These findings highlight the potential of compositional retrieval for tasks requiring multiple pieces of evidence or examples.