🤖 AI Summary
This work addresses a key limitation in existing multimodal foundation model–based recommender systems: despite parameter-efficient fine-tuning (PEFT), they produce uniform item embeddings that fail to capture heterogeneous user interests. To overcome this, we propose PerPEFT, the first approach that integrates user interest clustering with PEFT. Specifically, users are grouped by interest, and each group is assigned a dedicated lightweight fine-tuning module, enabling item embeddings to reflect group-specific, fine-grained characteristics. PerPEFT is compatible with any PEFT technique, introduces only 1.3% additional parameters, and achieves consistent performance gains across multiple PEFT variants—yielding up to a 15.3% improvement in NDCG@20—demonstrating strong generality, scalability, and efficient personalization capability.
📝 Abstract
In recent years, substantial research has integrated multimodal item metadata into recommender systems, often by using pre-trained multimodal foundation models to encode such data. Since these models are not originally trained for recommendation tasks, recent works efficiently adapt them via parameter-efficient fine-tuning (PEFT). However, even with PEFT, item embeddings from multimodal foundation models remain user-blind: item embeddings are not conditioned on user interests, despite the fact that users with diverse interests attend to different item aspects. To address this limitation, we propose PerPEFT, a personalized PEFT strategy for multimodal recommendation. Specifically, PerPEFT groups users by interest and assigns a distinct PEFT module to each group, enabling each module to capture the fine-grained item aspects most predictive of that group`s purchase decisions. We further introduce a specialized training technique that strengthens this user-group conditioning. Notably, PerPEFT is PEFT-agnostic and can be paired with any PEFT method applicable to multimodal foundation models. Through extensive experiments, we show that (1) PerPEFT outperforms the strongest baseline by up to 15.3% (NDCG@20) and (2) delivers consistent gains across diverse PEFT variants. It is noteworthy that, even with personalization, PEFT remains lightweight, adding only 1.3% of the parameter count of the foundation model. We provide our code and datasets at https://github.com/kswoo97/PerPEFT.