Funny or Persuasive, but Not Both: Evaluating Fine-Grained Multi-Concept Control in LLMs

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Current large language models exhibit limited capability in fine-grained control over multiple stylistic or semantic concepts—such as simultaneously modulating humor and persuasiveness—and lack a systematic evaluation framework for such tasks. This work proposes the first benchmark specifically designed to assess fine-grained text control in both single- and dual-concept settings, focusing on semantically distinguishable concept pairs. By conducting a comparative analysis of prompt engineering and representation-based approaches, we formulate controllable generation tasks and introduce quantitative metrics to evaluate performance. Extensive experiments across multiple mainstream models and tasks reveal a significant performance drop when controlling dual concepts jointly, thereby exposing, for the first time, the limitations of existing prompting methods under compositional assumptions. Our framework establishes a new foundation and direction for research on multi-concept controllable text generation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) offer strong generative capabilities, but many applications require explicit and \textit{fine-grained} control over specific textual concepts, such as humor, persuasiveness, or formality. Prior approaches in prompting and representation engineering can provide coarse or single-attribute control, but systematic evaluation of multi-attribute settings remains limited. We introduce an evaluation framework for fine-grained controllability for both single- and dual-concept scenarios, focusing on linguistically distinct concept pairs (e.g., persuasiveness vs.~humor). Surprisingly, across multiple LLMs and generative tasks, we find that performance often drops in the dual-concept setting, even though the chosen concepts should in principle be separable. This reveals a fundamental limitation of naive prompting-based control: models struggle with compositionality even when concepts are intuitively independent. Our framework provides systematic evidence of this gap and offers a principled approach for measuring the ability of future methods for multi-concept control.

Problem

Research questions and friction points this paper is trying to address.

fine-grained control

multi-concept control

large language models

compositionality

text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained controllability

multi-concept control

large language models