Towards Cognitively-Faithful Decision-Making Models to Improve AI Alignment

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing AI preference alignment models neglect human cognitive mechanisms—such as heuristic simplification—resulting in cognitively implausible and poorly generalizable decision policies. Method: We propose an axiomatic, cognitively faithful decision-making framework: first performing pairwise comparisons over option attributes, then aggregating outcomes via fixed, interpretable rules. Our approach integrates cognitive decision theory with the Bradley–Terry model, constructing a structured information-processing pipeline grounded in empirically observed pairwise comparison data. Contribution/Results: Evaluated on kidney allocation, our model achieves state-of-the-art fidelity to human decisions while substantially improving interpretability and cross-task generalization—demonstrated via rigorous out-of-distribution validation. By grounding value alignment in empirically supported cognitive principles without sacrificing computational tractability, our framework offers a novel, principled pathway toward both cognitive plausibility and engineering feasibility in preference-aligned AI systems.

Technology Category

Application Category

📝 Abstract

Recent AI work trends towards incorporating human-centric objectives, with the explicit goal of aligning AI models to personal preferences and societal values. Using standard preference elicitation methods, researchers and practitioners build models of human decisions and judgments, which are then used to align AI behavior with that of humans. However, models commonly used in such elicitation processes often do not capture the true cognitive processes of human decision making, such as when people use heuristics to simplify information associated with a decision problem. As a result, models learned from people's decisions often do not align with their cognitive processes, and can not be used to validate the learning framework for generalization to other decision-making tasks. To address this limitation, we take an axiomatic approach to learning cognitively faithful decision processes from pairwise comparisons. Building on the vast literature characterizing the cognitive processes that contribute to human decision-making, and recent work characterizing such processes in pairwise comparison tasks, we define a class of models in which individual features are first processed and compared across alternatives, and then the processed features are then aggregated via a fixed rule, such as the Bradley-Terry rule. This structured processing of information ensures such models are realistic and feasible candidates to represent underlying human decision-making processes. We demonstrate the efficacy of this modeling approach in learning interpretable models of human decision making in a kidney allocation task, and show that our proposed models match or surpass the accuracy of prior models of human pairwise decision-making.

Problem

Research questions and friction points this paper is trying to address.

Improving AI alignment with cognitively faithful human decision models

Addressing limitations of standard preference elicitation methods

Learning interpretable models from pairwise comparisons using axiomatic approach

Innovation

Methods, ideas, or system contributions that make the work stand out.

Axiomatic approach for cognitively faithful decision processes

Pairwise comparisons with structured feature processing

Bradley-Terry rule aggregation for realistic human modeling

🔎 Similar Papers

ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs