Prompt-Based Value Steering of Large Language Models

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses **value alignment without model fine-tuning or dynamic prompt optimization**: how to guide large language models (LLMs) to generate value-congruent text via *static prompt design* alone. Methodologically, it formalizes target human values—grounded in Schwartz’s Theory of Basic Human Values—and constructs a structured dialogue dataset; it further proposes a reproducible, model-agnostic prompt evaluation framework that quantifies both the *presence* and *incremental gain* of target values in generated outputs. Experiments using a Wizard-Vicuna variant demonstrate that prompts explicitly conditioning on target values significantly improve value consistency over baseline prompts, with statistically significant alignment gains. The core contribution is the first *quantifiable, static-prompt evaluation paradigm* for dynamic value alignment—enabling lightweight, interpretable, and controllable value guidance in LLMs.

Technology Category

Application Category

📝 Abstract

Large language models are increasingly used in applications where alignment with human values is critical. While model fine-tuning is often employed to ensure safe responses, this technique is static and does not lend itself to everyday situations involving dynamic values and preferences. In this paper, we present a practical, reproducible, and model-agnostic procedure to evaluate whether a prompt candidate can effectively steer generated text toward specific human values, formalising a scoring method to quantify the presence and gain of target values in generated responses. We apply our method to a variant of the Wizard-Vicuna language model, using Schwartz's theory of basic human values and a structured evaluation through a dialogue dataset. With this setup, we compare a baseline prompt to one explicitly conditioned on values, and show that value steering is possible even without altering the model or dynamically optimising prompts.

Problem

Research questions and friction points this paper is trying to address.

Steering LLM outputs toward human values

Evaluating prompt effectiveness for value alignment

Achieving value alignment without model fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt-based steering method for value alignment

Model-agnostic procedure evaluating prompt effectiveness

Quantitative scoring for target values in responses

🔎 Similar Papers

No similar papers found.