🤖 AI Summary
This work addresses the limitations of traditional retrieval-augmented generation (RAG), which relies on semantic similarity and struggles with tasks governed by rules, constraints, or procedural logic. The authors propose Task-Aligned Retrieval (TAG), a novel framework that reformulates documents into traceable condition–action rules. Leveraging a large language model to assess input–rule compatibility, TAG retrieves only the actions matching applicable conditions and generates outputs solely from these aligned actions. By abandoning the assumption of semantic relevance and reorienting retrieval toward task applicability, TAG achieves more precise and efficient context selection. Experiments demonstrate substantial improvements over standard RAG on Wikipedia neutral-point-of-view rewriting, HumanEval (PEP8 compliance), and NBA trade reasoning tasks, with performance gains up to 12.2% in high-mismatch scenarios and up to 93% reduction in retrieved context size.
📝 Abstract
Retrieval-augmented generation (RAG) ranks passages by semantic similarity to the input, implicitly assuming that semantic similarity is a reliable indication of applicability in downstream tasks. This assumption breaks down when task success depends not on topical relevance but on applying the correct rules, constraints, or procedural guidance. In such settings, the most useful context may be the rule triggered by the input rather than the most semantically similar passage. We propose Task-Aligned Retrieval (TAG), a retrieval framework that replaces similarity-based retrieval with applicability-based rule selection. TAG transforms source documents into traceable condition-action rules, identifies which rules apply to a given input through pairwise LLM judgments, and generates the output conditioned only on the selected actions. We empirically observe that across Wikipedia NPOV rewriting, HumanEval with PEP~8 compliance, and NBA transaction reasoning on RuleArena, TAG consistently outperforms standard RAG, with the largest gains in high-mismatch settings (up to 12.2\%) while reducing retrieved context by up to 93\%. These results suggest that, in rule- and instruction-governed tasks, retrieval should optimize for applicability rather than for semantic similarity alone.