🤖 AI Summary
Dual-tower models are efficient for pre-ranking but suffer from limited expressiveness due to strict structural decoupling between user and item towers, which precludes cross-tower feature interaction. To address this, we propose FIT (Learnable Full-Interaction Twin-tower), a novel architecture that breaks the decoupling bottleneck while preserving inference efficiency. FIT introduces two key innovations: (1) a learnable meta-matrix enabling early-stage cross-tower feature alignment, and (2) a lightweight similarity scoring module for fine-grained late-stage interaction modeling. Crucially, FIT supports end-to-end joint optimization without compromising the computational benefits of the twin-tower paradigm. Extensive experiments on multiple public benchmarks demonstrate that FIT consistently outperforms state-of-the-art pre-ranking models—including YouTube DNN, DSSM, and TwinBERT—achieving 3.2–7.8% absolute gains in Recall@10 while increasing inference latency by less than 5%, thus striking an optimal balance between accuracy and latency.
📝 Abstract
Pre-ranking plays a crucial role in large-scale recommender systems by significantly improving the efficiency and scalability within the constraints of providing high-quality candidate sets in real time. The two-tower model is widely used in pre-ranking systems due to a good balance between efficiency and effectiveness with decoupled architecture, which independently processes user and item inputs before calculating their interaction (e.g. dot product or similarity measure). However, this independence also leads to the lack of information interaction between the two towers, resulting in less effectiveness. In this paper, a novel architecture named learnable Fully Interacted Two-tower Model (FIT) is proposed, which enables rich information interactions while ensuring inference efficiency. FIT mainly consists of two parts: Meta Query Module (MQM) and Lightweight Similarity Scorer (LSS). Specifically, MQM introduces a learnable item meta matrix to achieve expressive early interaction between user and item features. Moreover, LSS is designed to further obtain effective late interaction between the user and item towers. Finally, experimental results on several public datasets show that our proposed FIT significantly outperforms the state-of-the-art baseline pre-ranking models.