CS3: Efficient Online Capability Synergy for Two-Tower Recommendation

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the limitations of two-tower recommendation models, which suffer from restricted representational capacity due to structural isolation, difficulties in embedding space alignment, and insufficient cross-feature modeling, while existing approaches struggle to balance online learning efficiency with low-latency inference. To overcome these challenges, we propose the CS3 framework, which integrates cyclic adaptive feature denoising, cross-tower representation synchronization, and cascaded knowledge reuse to enable intra-tower self-correction, inter-tower mutual awareness, and cross-stage consistency. CS3 significantly enhances recommendation performance without compromising millisecond-level response times, is compatible with diverse two-tower architectures, and supports efficient online learning. Extensive evaluations on three public benchmarks and a large-scale advertising system demonstrate its effectiveness, achieving up to an 8.36% increase in online ad revenue.
📝 Abstract
To balance effectiveness and efficiency in recommender systems, multi-stage pipelines employ lightweight two-tower models for large-scale candidate retrieval. However, their isolated architecture inherently hampers representation capacity, embedding-space alignment, and cross-feature modeling. Prior studies have explored incorporating late interaction or knowledge distillation to mitigate these issues, but such approaches often significantly increase model latency or pose challenges for implementation in online learning scenarios. To address these limitations, we propose an efficient online framework called Capability Synergy (CS3), which enhances two-tower models through three key innovations: (1) Cycle-Adaptive Structure, enabling self-revision via adaptive feature denoising within individual towers; (2) Cross-Tower Synchronization, improving representation alignment through mutual awareness between the towers; and (3) CascadeModel Sharing, bridging cross-stage consistency by reusing knowledge from downstream models. The CS3 framework is compatible with various two-tower architectures and meets real-time requirements in online learning scenarios. We evaluated CS3 on three public offline datasets and subsequently deployed it in a large-scale advertising system. Experimental results demonstrate that CS3 increases online ad revenue by up to 8.36% across three scenarios while maintaining millisecond-level latency and consistently performing well across diverse two-tower architectures.
Problem

Research questions and friction points this paper is trying to address.

two-tower recommendation
representation capacity
embedding-space alignment
online learning
model latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

two-tower recommendation
online learning
representation alignment
feature denoising
model sharing
🔎 Similar Papers
No similar papers found.