MuonRec: Shifting the Optimizer Paradigm Beyond Adam in Scalable Generative Recommendation

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes MuonRec, a novel optimization framework for recommendation systems that addresses the underexplored suitability of widely adopted Adam/AdamW optimizers in large-scale recommendation tasks. MuonRec introduces, for the first time, an orthogonal momentum update mechanism based on the Newton–Schulz iteration, leveraging the Muon optimizer to perform orthogonalization directly on 2D weight matrices. This approach overcomes key limitations of conventional adaptive optimizers and is compatible with both sequential and generative recommendation models. Extensive experiments demonstrate that MuonRec reduces training steps by 32.4% on average while improving NDCG@10 by 12.6%, with particularly pronounced gains observed in generative recommendation architectures.

Technology Category

Application Category

📝 Abstract
Recommender systems (RecSys) are increasingly emphasizing scaling, leveraging larger architectures and more interaction data to improve personalization. Yet, despite the optimizer's pivotal role in training, modern RecSys pipelines almost universally default to Adam/AdamW, with limited scrutiny of whether these choices are truly optimal for recommendation. In this work, we revisit optimizer design for scalable recommendation and introduce MuonRec, the first framework that brings the recently proposed Muon optimizer to RecSys training. Muon performs orthogonalized momentum updates for 2D weight matrices via Newton-Schulz iteration, promoting diverse update directions and improving optimization efficiency. We develop an open-source training recipe for recommendation models and evaluate it across both traditional sequential recommenders and modern generative recommenders. Extensive experiments demonstrate that MuonRec reduces converged training steps by an average of 32.4\% while simultaneously improving final ranking quality. Specifically, MuonRec yields consistent relative gains in NDCG@10, averaging 12.6\% across all settings, with particularly pronounced improvements in generative recommendation models. These results consistently outperform strong Adam/AdamW baselines, positioning Muon as a promising new optimizer standard for RecSys training. Our code is available.
Problem

Research questions and friction points this paper is trying to address.

recommender systems
optimizer
scalable recommendation
generative recommendation
Adam
Innovation

Methods, ideas, or system contributions that make the work stand out.

Muon optimizer
orthogonalized momentum
generative recommendation
scalable RecSys
Newton-Schulz iteration
🔎 Similar Papers