🤖 AI Summary
This study investigates the feasibility of data minimization for implicit feedback in recommender systems—i.e., performing inference using only the minimal necessary user data while preserving recommendation performance. We propose a novel formalization of data minimization and systematically analyze how model architecture, optimization objectives, and user behavioral characteristics influence data necessity. Leveraging diverse data reduction techniques, we conduct empirical evaluations across multiple recommendation models. Results demonstrate that inference-time data requirements can be reduced by 40–80% on average, with NDCG@10 degradation typically under 3%, confirming technical feasibility. However, minimization efficacy is highly contingent on system configuration and user activity levels. Our key contributions are: (1) the first quantitative, evaluable formulation of data minimization as a distinct subproblem in recommendation; and (2) an empirical characterization of its dual nature—technically viable yet practically complex to deploy.
📝 Abstract
Data minimization is a legal principle requiring personal data processing to be limited to what is necessary for a specified purpose. Operationalizing this principle for recommender systems, which rely on extensive personal data, remains a significant challenge. This paper conducts a feasibility study on minimizing implicit feedback inference data for such systems. We propose a novel problem formulation, analyze various minimization techniques, and investigate key factors influencing their effectiveness. We demonstrate that substantial inference data reduction is technically feasible without significant performance loss. However, its practicality is critically determined by two factors: the technical setting (e.g., performance targets, choice of model) and user characteristics (e.g., history size, preference complexity). Thus, while we establish its technical feasibility, we conclude that data minimization remains practically challenging and its dependence on the technical and user context makes a universal standard for data `necessity' difficult to implement.