🤖 AI Summary
This work addresses the limitation of existing preference-aware Shapley value methods, which are restricted to acyclic binary preferences and cannot handle cyclic or weighted precedence relations prevalent in real-world scenarios. We propose the Generalized Preference-Aware Shapley Value (GPASV), the first framework capable of accommodating arbitrary directed weighted preference graphs. GPASV introduces soft penalties—via edge weights—for violations of preferred ordering, rather than enforcing hard constraints, and establishes its theoretical foundation through an axiomatic characterization. We develop efficient algorithms based on random order sampling and graph-weight reconstruction, and extend preference-scan diagnostic tools to subsume several classical models as special cases. Experiments on the Chatbot Arena’s cyclic preference graph demonstrate that different weighting strategies substantially influence the valuation of large language model ensembles, highlighting GPASV’s flexibility and sensitivity.
📝 Abstract
Shapley value and its priority-aware extensions are widely used for valuation in machine learning, but existing methods require pairwise priority to be binary and acyclic, a restriction spectacularly violated in real-data examples such as aggregated human preferences and multi-criterion comparisons. We introduce the generalized priority-aware Shapley value (GPASV), a random order value defined on arbitrary directed weighted priority graphs, in which pairwise edges penalize rather than forbid order violations. GPASV covers a range of classical models as boundary cases. We establish GPASV through an axiomatic characterization, develop the associated computational methods, and introduce a priority sweeping diagnostic extending PASV's. We apply GPASV to LLM ensemble valuation on the cyclic Chatbot Arena preference graph, illustrating that priority-aware valuation is not a one-button operation: different balances of pairwise graph priority versus individual soft priority produce substantively different valuations of the same data.