π€ AI Summary
This work addresses the vulnerability of traditional item ID vocabularies in sparse scaling regimes to low-frequency noise, which induces embedding collapse and degrades generalization. To mitigate this, we propose the first application of orthogonality constraints in industrial-scale recommendation systems under sparse scaling, enforcing alignment of the singular value spectrum of embedding manifolds with an orthogonal basis during backpropagation. This enhances representation isotropy and suppresses overfitting to rare items. By integrating high singular entropy embedding learning with large-scale sparse vocabulary optimization, our method achieves a 12.97% improvement in UCXR and an 8.9% increase in GMV on JDβs real-world system, while significantly accelerating convergence. These results demonstrate the approachβs scalability and effectiveness across both sparse and dense architectures.
π Abstract
In industrial commodity recommendation systems, the representation quality of Item-Id vocabularies directly impacts the scalability and generalization ability of recommendation models. A key challenge is that traditional Item-Id vocabularies, when subjected to sparse scaling, suffer from low-frequency information interference, which restricts their expressive power for massive item sets and leads to representation collapse. To address this issue, we propose an Orthogonal Constrained Projection method to optimize embedding representation. By enforcing orthogonality, the projection constrains the backpropagation manifold, aligning the singular value spectrum of the learned embeddings with the orthogonal basis. This alignment ensures high singular entropy, thereby preserving isotropic generalized features while suppressing spurious correlations and overfitting to rare items. Empirical results demonstrate that OCP accelerates loss convergence and enhances the model's scalability; notably, it enables consistent performance gains when scaling up dense layers. Large-scale industrial deployment on JD.com further confirms its efficacy, yielding a 12.97% increase in UCXR and an 8.9% uplift in GMV, highlighting its robust utility for scaling up both sparse vocabularies and dense architectures.