🤖 AI Summary
Existing preference models treat human judgments as black-box processes, obscuring the underlying drivers of preferences and their cross-group variations. This paper introduces the first interpretable, multi-group-adaptive preference modeling framework. It decomposes human judgments into latent attributes, integrates counterfactual attribute synthesis with attention mechanisms, and dynamically learns group-specific attribute weights across distinct social communities (e.g., academic, conflict-oriented, supportive). Grounded in multi-attribute decision theory, cognitive science principles, and counterfactual data generation, the approach enables causal interpretation of preference formation mechanisms. Empirically evaluated on 45 Reddit communities, our model achieves an average prediction accuracy 46.6% higher than GPT-4o and substantially outperforms state-of-the-art black-box baselines. Moreover, it systematically uncovers community-specific preference patterns, offering actionable insights into socio-cognitive determinants of evaluative behavior.
📝 Abstract
Personalizing AI systems requires understanding not just what users prefer, but the reasons that underlie those preferences - yet current preference models typically treat human judgment as a black box. We introduce PrefPalette, a framework that decomposes preferences into attribute dimensions and tailors its preference prediction to distinct social community values in a human-interpretable manner. PrefPalette operationalizes a cognitive science principle known as multi-attribute decision making in two ways: (1) a scalable counterfactual attribute synthesis step that involves generating synthetic training data to isolate for individual attribute effects (e.g., formality, humor, cultural values), and (2) attention-based preference modeling that learns how different social communities dynamically weight these attributes. This approach moves beyond aggregate preference modeling to capture the diverse evaluation frameworks that drive human judgment. When evaluated on 45 social communities from the online platform Reddit, PrefPalette outperforms GPT-4o by 46.6% in average prediction accuracy. Beyond raw predictive improvements, PrefPalette also shed light on intuitive, community-specific profiles: scholarly communities prioritize verbosity and stimulation, conflict-oriented communities value sarcasm and directness, and support-based communities emphasize empathy. By modeling the attribute-mediated structure of human judgment, PrefPalette delivers both superior preference modeling and transparent, interpretable insights, and serves as a first step toward more trustworthy, value-aware personalized applications.