🤖 AI Summary
Existing linear autoencoders (LAEs) model text via sparse word co-occurrence, limiting their semantic representation capability. To address this, we propose L³AE—the first recommendation model integrating large language model (LLM) embeddings into a linear autoencoder framework. Our approach tackles the semantic bottleneck of linear models through two key innovations: (1) leveraging LLMs to extract high-quality textual item representations and construct a semantically aware item association matrix; and (2) introducing a two-stage optimization strategy that jointly learns collaborative interaction weights and semantic regularization coefficients via closed-form solutions—ensuring both global optimality and computational efficiency. Extensive experiments on three benchmark datasets demonstrate that L³AE significantly outperforms state-of-the-art methods: Recall@20 improves by 27.6% and NDCG@20 by 39.3%, effectively overcoming the semantic expressiveness limitations inherent in conventional linear models.
📝 Abstract
Large language models (LLMs) have been widely adopted to enrich the semantic representation of textual item information in recommender systems. However, existing linear autoencoders (LAEs) that incorporate textual information rely on sparse word co-occurrence patterns, limiting their ability to capture rich textual semantics. To address this, we propose L3AE, the first integration of LLMs into the LAE framework. L3AE effectively integrates the heterogeneous knowledge of textual semantics and user-item interactions through a two-phase optimization strategy. (i) L3AE first constructs a semantic item-to-item correlation matrix from LLM-derived item representations. (ii) It then learns an item-to-item weight matrix from collaborative signals while distilling semantic item correlations as regularization. Notably, each phase of L3AE is optimized through closed-form solutions, ensuring global optimality and computational efficiency. Extensive experiments demonstrate that L3AE consistently outperforms state-of-the-art LLM-enhanced models on three benchmark datasets, achieving gains of 27.6% in Recall@20 and 39.3% in NDCG@20. The source code is available at https://github.com/jaewan7599/L3AE_CIKM2025.