FastLane: Efficient Routed Systems for Late-Interaction Retrieval

📅 2026-01-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes FastLane, a novel framework addressing the high computational cost of late-interaction retrieval models like ColBERT and their incompatibility with approximate nearest neighbor search (ANNS). FastLane introduces a learnable dynamic routing mechanism that combines self-attention with a differentiable selection strategy to dynamically filter query-relevant token-level representations, thereby eliminating redundant computations. It is the first approach to enable efficient integration of late-interaction models with ANNS, while supporting multilingual, multimodal, and long-context scenarios. Experimental results demonstrate up to a 30-fold reduction in computational complexity with competitive retrieval performance, significantly enhancing the scalability and low-latency capabilities of large-scale retrieval systems.

Technology Category

Application Category

📝 Abstract
Late-interaction retrieval models like ColBERT achieve superior accuracy by enabling token-level interactions, but their computational cost hinders scalability and integration with Approximate Nearest Neighbor Search (ANNS). We introduce FastLane, a novel retrieval framework that dynamically routes queries to their most informative representations, eliminating redundant token comparisons. FastLane employs a learnable routing mechanism optimized alongside the embedding model, leveraging self-attention and differentiable selection to maximize efficiency. Our approach reduces computational complexity by up to 30x while maintaining competitive retrieval performance. By bridging late-interaction models with ANNS, FastLane enables scalable, low-latency retrieval, making it feasible for large-scale applications such as search engines, recommendation systems, and question-answering platforms. This work opens pathways for multi-lingual, multi-modal, and long-context retrieval, pushing the frontier of efficient and adaptive information retrieval.
Problem

Research questions and friction points this paper is trying to address.

late-interaction retrieval
computational efficiency
Approximate Nearest Neighbor Search
scalable retrieval
token-level interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

FastLane
late-interaction retrieval
learnable routing
approximate nearest neighbor search
efficient retrieval
🔎 Similar Papers
No similar papers found.