Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Complex tasks requiring integration of multi-source evidence necessitate compositional retrieval—selecting and combining context fragments across multiple steps. Method: We propose a serialized compositional retrieval paradigm that formalizes retrieval as a Markov Decision Process (MDP), iteratively selecting and composing contextual snippets to construct information-complete prompts. Our approach introduces a novel triple-encoder architecture to explicitly model cross-sample dependencies and designs a structural consistency reward aligned with LLM preferences, optimized end-to-end via supervised pretraining followed by reinforcement fine-tuning. Contribution/Results: Evaluated on multi-hop reasoning and code synthesis benchmarks, our method significantly outperforms single-step retrieval baselines. It demonstrates both effectiveness in constructing complex, task-relevant contexts and strong generalization across diverse compositional reasoning tasks, validating the paradigm’s utility for evidence aggregation in advanced language understanding and generation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks. While retrieval-augmented frameworks traditionally focus on selecting top-ranked documents in a single pass, many real-world scenarios demand compositional retrieval, where multiple sources must be combined in a coordinated manner. In this work, we propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP), decomposing the probability of retrieving a set of elements into a sequence of conditional probabilities and allowing each retrieval step to be conditioned on previously selected examples. We train the retriever in two stages: first, we efficiently construct supervised sequential data for initial policy training; we then refine the policy to align with the LLM's preferences using a reward grounded in the structural correspondence of generated programs. Experimental results show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies. These findings highlight the potential of compositional retrieval for tasks requiring multiple pieces of evidence or examples.

Problem

Research questions and friction points this paper is trying to address.

Enhancing retrieval for multi-source compositional tasks

Modeling retrieval as sequential conditional probabilities

Aligning retrieval policy with LLM preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tri-encoder sequential retriever for compositional retrieval

Markov Decision Process models conditional retrieval steps

Two-stage training with supervised and reward-aligned policy

🔎 Similar Papers

No similar papers found.