๐ค AI Summary
This work addresses the lack of data-driven, fine-grained evaluation criteria for personalized question answering in current large language models. The authors propose CoPA, a novel benchmark that leverages real user interaction data to identify individualโgroup preference discrepancies (CIPD) and distills six interpretable cognitive factors underlying personalization. Built upon 1,985 user profiles, CoPA enables factor-level, fine-grained alignment measurement. Experimental results demonstrate that CoPA substantially outperforms conventional approaches based on lexical similarity or handcrafted rules, offering a more discriminative and comprehensive standard for evaluating personalized question-answering systems.
๐ Abstract
While LLMs have demonstrated remarkable potential in Question Answering (QA), evaluating personalization remains a critical bottleneck. Existing paradigms predominantly rely on lexical-level similarity or manual heuristics, often lacking sufficient data-driven validation. We address this by mining Community-Individual Preference Divergence (CIPD), where individual choices override consensus, to distill six key personalization factors as evaluative dimensions. Accordingly, we introduce CoPA, a benchmark with 1,985 user profiles for fine-grained, factor-level assessment. By quantifying the alignment between model outputs and user-specific cognitive preferences inferred from interaction patterns, CoPA provides a more comprehensive and discriminative standard for evaluating personalized QA than generic metrics. The code is available at https://github.com/bjzgcai/CoPA.