What Matters in LLM-Based Feature Extractor for Recommender? A Systematic Analysis of Prompts, Models, and Adaptation

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing studies employ large language models (LLMs) as semantic feature extractors for sequential recommendation systems (SRS), but suffer from highly heterogeneous prompt design, architectural choices, and adaptation strategies—hindering fair attribution of design factors. To address this, we propose RecXplore, the first modular analytical framework that decouples LLM-driven sequential recommendation into four independently evaluable components: data processing, feature extraction, feature adaptation, and sequence modeling. This enables standardized ablation studies and systematic discovery of effective design patterns. Evaluated on four public benchmark datasets, RecXplore achieves end-to-end improvements of +18.7% in NDCG@5 and +12.7% in HR@5 over strong baselines by composing state-of-the-art modules. Our core contribution is establishing the first decomposable, reproducible, and comparable analytical paradigm for LLM-based feature extraction in sequential recommendation.

Technology Category

Application Category

📝 Abstract

Using Large Language Models (LLMs) to generate semantic features has been demonstrated as a powerful paradigm for enhancing Sequential Recommender Systems (SRS). This typically involves three stages: processing item text, extracting features with LLMs, and adapting them for downstream models. However, existing methods vary widely in prompting, architecture, and adaptation strategies, making it difficult to fairly compare design choices and identify what truly drives performance. In this work, we propose RecXplore, a modular analytical framework that decomposes the LLM-as-feature-extractor pipeline into four modules: data processing, semantic feature extraction, feature adaptation, and sequential modeling. Instead of proposing new techniques, RecXplore revisits and organizes established methods, enabling systematic exploration of each module in isolation. Experiments on four public datasets show that simply combining the best designs from existing techniques without exhaustive search yields up to 18.7% relative improvement in NDCG@5 and 12.7% in HR@5 over strong baselines. These results underscore the utility of modular benchmarking for identifying effective design patterns and promoting standardized research in LLM-enhanced recommendation.

Problem

Research questions and friction points this paper is trying to address.

Systematically analyzes LLM-based feature extraction for recommenders

Identifies key factors in prompts, models, and adaptation strategies

Proposes modular framework to evaluate design choices fairly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for feature extraction

Systematic analysis of existing methods

Combining best designs without exhaustive search

🔎 Similar Papers

Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis