How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models often exhibit undesirable behaviors—such as intent misalignment and personality inconsistency—in sensitive social contexts, underscoring the urgent need for a systematic evaluation framework. This work proposes SteerEval, the first three-tiered controllability benchmark that spans multiple behavioral granularities, unifying assessment across linguistic features, affective states, and personality traits through three hierarchical levels: L1 (content), L2 (style), and L3 (realization). By integrating hierarchical behavioral modeling, multidimensional metrics, and a systematic comparison of mainstream steering methods, the study reveals a significant performance drop in existing approaches at fine-grained levels. These findings demonstrate both the necessity and effectiveness of SteerEval in advancing research toward safe, interpretable, and controllable large language models.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
controllability
behavioral consistency
steering
evaluation benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

controllability
hierarchical benchmark
behavioral granularity
large language models
steering evaluation
🔎 Similar Papers
No similar papers found.
Z
Ziwen Xu
Zhejiang University
K
Kewei Xu
Zhejiang University
H
Haoming Xu
Zhejiang University
H
Haiwen Hong
Alibaba Group
Longtao Huang
Longtao Huang
Alibaba Group
Knowledge GraphService ComputingData Mining
H
Hui Xue
Alibaba Group
Ningyu Zhang
Ningyu Zhang
Ph.D. Student, Vanderbilt University
artificial intelligencelearning analyticslearning environments
Y
Yongliang Shen
Zhejiang University
G
Guozhou Zheng
Zhejiang University
H
Huajun Chen
Zhejiang University
Shumin Deng
Shumin Deng
National University of Singapore
NLPLLM Planning & ReasoningLLM AgentKGIE