Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the robustness and expressive capacity of value-orientation probing methods for large language models (LLMs), with emphasis on their ability to capture context sensitivity and alignment with real-world behavioral preferences. We introduce two novel tasks: demographic context sensitivity testing and behavioral alignment assessment. Leveraging prompt perturbations, answer-option shuffling, and a dual-paradigm framework—free-form generation followed by preference selection—we quantitatively benchmark three mainstream probing approaches. Results reveal poor stability under input perturbations, negligible responsiveness to demographic cues, and only weak correlation between probed values and actual model behavior. To our knowledge, this work is the first to empirically expose a substantial gap between value probing and behavioral instantiation. It provides both theoretical grounding and empirical evidence for developing more reliable, interpretable frameworks for value alignment evaluation.

Technology Category

Application Category

📝 Abstract
There has been extensive research on assessing the value orientation of Large Language Models (LLMs) as it can shape user experiences across demographic groups. However, several challenges remain. First, while the Multiple Choice Question (MCQ) setting has been shown to be vulnerable to perturbations, there is no systematic comparison of probing methods for value probing. Second, it is unclear to what extent the probed values capture in-context information and reflect models' preferences for real-world actions. In this paper, we evaluate the robustness and expressiveness of value representations across three widely used probing strategies. We use variations in prompts and options, showing that all methods exhibit large variances under input perturbations. We also introduce two tasks studying whether the values are responsive to demographic context, and how well they align with the models' behaviors in value-related scenarios. We show that the demographic context has little effect on the free-text generation, and the models' values only weakly correlate with their preference for value-based actions. Our work highlights the need for a more careful examination of LLM value probing and awareness of its limitations.
Problem

Research questions and friction points this paper is trying to address.

Evaluate robustness of LLM value probing methods under perturbations
Assess if probed values reflect real-world action preferences
Examine demographic context impact on value representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates robustness of value probing strategies
Introduces tasks for demographic context responsiveness
Assesses alignment of values with model behaviors
🔎 Similar Papers
No similar papers found.