🤖 AI Summary
This work proposes Thought 1 (T1), a generative retrieval model designed to address the failure of static representation alignment in reasoning-intensive retrieval scenarios caused by lexical mismatches or implicit inference requirements. T1 dynamically generates intermediate reasoning traces for queries and introduces a special <embtoken> as a semantic aggregation point, thereby shifting relevance modeling from static alignment to dynamic reasoning. The model is trained via a three-stage curriculum followed by GRPO-based reinforcement learning to optimize its reasoning strategy. As the first approach to integrate dynamic reasoning generation into retrieval systems, T1 transcends the limitations of conventional contrastive learning by enabling query-adaptive reasoning paths. On the BRIGHT benchmark, T1-4B outperforms larger contrastive models under the original query setting and achieves performance comparable to multi-stage retrieval pipelines.
📝 Abstract
The central challenge of reasoning-intensive retrieval lies in identifying implicitreasoning relationships between queries and documents, rather than superficial se-mantic or lexical similarity. The contrastive learning paradigm is fundamentallya static representation consolidation technique: during training, it encodes hier-archical relevance concepts into fixed geometric structures in the vector space,and at inference time it cannot dynamically adjust relevance judgments accord-ing to the specific reasoning demands of each query. Consequently, performancedegrades noticeably when vocabulary mismatch exists between queries and doc-uments or when implicit reasoning is required to establish relevance. This pa-per proposes Thought 1 (T1), a generative retrieval model that shifts relevancemodeling from static alignment to dynamic reasoning. On the query side, T1 dy-namically generates intermediate reasoning trajectories for each query to bridgeimplicit reasoning relationships and uses <embtoken> as a semantic aggregationpoint for the reasoning output. On the document side, it employs an instruction+ text + <embtoken> encoding format to support high-throughput indexing. Tointernalize dynamic reasoning capabilities into vector representations, we adopt athree-stage training curriculum and introduce GRPO in the third stage, enablingthe model to learn optimal derivation strategies for different queries through trial-and-error reinforcement learning. On the BRIGHT benchmark, T1-4B exhibitsstrong performance under the original query setting, outperforming larger modelstrained with contrastive learning overall, and achieving performance comparableto multi-stage retrieval pipelines. The results demonstrate that replacing static rep-resentation alignment with dynamic reasoning generation can effectively improvereasoning-intensive retrieval performance.