Beyond Static Collision Handling: Adaptive Semantic ID Learning for Multimodal Recommendation at Industrial Scale

📅 2026-04-25

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge in large-scale multimodal recommendation systems where semantic ID (SID) learning struggles to simultaneously achieve compactness, semantic fidelity, and strong recommendation performance due to static collision-handling strategies. To overcome this limitation, we propose AdaSID, a novel framework that introduces an adaptive overlap regulation mechanism for the first time. In the first phase, semantically compatible SIDs are allowed to share representations to preserve meaningful overlaps; in the second phase, the repulsion strength is dynamically adjusted based on local collision load and training progress, enabling intelligent suppression or retention of SID conflicts. Integrating multimodal discrete SID modeling, adaptive regularization, and end-to-end joint optimization, AdaSID achieves average gains of 4.5% in Recall and NDCG on public benchmarks and demonstrates a 0.98% increase in GMV alongside a significant improvement in ranking AUC in a large-scale A/B test on Kuaishou E-commerce.

Technology Category

Application Category

📝 Abstract

Modern recommendation systems involve massive catalogs of multimodal items, where scalable item identification must balance compactness, semantic fidelity, and downstream effectiveness. Semantic IDs (SIDs) address this need by representing items as short discrete token sequences derived from multimodal signals, providing a compact interface for retrieval, ranking, and generative recommendation. However, effective SID learning is hindered by collisions, where different items are assigned identical or highly confusable codes. Existing methods mainly rely on improved quantization or fixed overlap regularization, but they do not adaptively distinguish whether an overlap should be suppressed or preserved. We propose AdaSID, an adaptive semantic ID learning framework for recommendation. AdaSID regulates SID overlaps through a two-stage process. First, it relaxes repulsion for observed overlaps when the involved items are semantically compatible, preserving admissible sharing rather than uniformly separating all collisions. Second, it allocates the remaining regulation pressure according to local collision load and training progress, strengthening control in congested regions while gradually rebalancing optimization toward recommendation alignment. This design adaptively decides which overlaps to penalize, how strongly to regulate them, and when to shift the learning focus. Extensive offline and online experiments validate AdaSID. On two public benchmarks, AdaSID improves Recall and NDCG by about 4.5% on average over strong baselines, while improving codebook utilization and SID diversity. In Kuaishou e-commerce, an online A/B test on short-video retrieval covering tens of millions of users achieves statistically significant gains, including a 0.98% GMV improvement, and industrial ranking evaluation shows consistent AUC improvements.

Problem

Research questions and friction points this paper is trying to address.

Semantic ID

collision handling

multimodal recommendation

adaptive learning

code overlap

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Semantic ID

Collision Handling

Multimodal Recommendation