SID-Coord: Coordinating Semantic IDs for ID-based Ranking in Short-Video Search

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
This work addresses the limited generalization capability of hash-ID-based ranking models on long-tail content in short-video search. To enhance both memorization and generalization without modifying the backbone model, the authors propose SID-Coord, a framework that introduces trainable Semantic IDs (SIDs). The approach innovatively encodes semantic information into discrete, learnable IDs and incorporates three key components: attention-based fusion, target-aware gating, and interest alignment. A hierarchical SID structure combined with an HID-SID adaptive mechanism enables lightweight deployment in industrial settings. Online A/B experiments demonstrate statistically significant improvements, with a 0.664% increase in long-view rate and a 0.369% gain in watch time in the search scenario.

Technology Category

Application Category

📝 Abstract
Large-scale short-video search ranking models are typically trained on sparse co-occurrence signals over hashed item identifiers (HIDs). While effective at memorizing frequent interactions, such ID-based models struggle to generalize to long-tailed items with limited exposure. This memorization-generalization trade-off remains a longstanding challenge in such industrial systems. We propose SID-Coord, a lightweight Semantic ID framework that incorporates discrete, trainable semantic IDs (SIDs) directly into ID-based ranking models. Instead of treating semantic signals as auxiliary dense features, SID-Coord represents semantics as structured identifiers and coordinates HID-based memorization with SID-based generalization within a unified modeling framework. To enable effective coordination, SID-Coord introduces three components: (1) an attention-based fusion module over hierarchical SIDs to capture multi-level semantics, (2) a target-aware HID-SID gating mechanism that adaptively balances memorization and generalization, and (3) a SID-driven interest alignment module that models the semantic similarity distribution between target items and user histories. SID-Coord can be integrated into existing production ranking systems without modifying the backbone model. Online A/B experiments in a real-world production environment show statistically significant improvements, with a +0.664% gain in long-play rate in search and a +0.369% increase in search playback duration.
Problem

Research questions and friction points this paper is trying to address.

short-video search
ID-based ranking
memorization-generalization trade-off
long-tailed items
semantic representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic ID
ID-based Ranking
Memorization-Generalization Trade-off
Short-video Search
Interest Alignment
🔎 Similar Papers
No similar papers found.