🤖 AI Summary
This work addresses systematic limitations in existing creative quality alignment (CQA) datasets, particularly their inadequate modeling of audience preferences and insufficient coverage of real-world logical constraints. To overcome these issues under stringent engineering and data scarcity conditions, the authors propose a low-resource CQA approach that leverages only around one hundred expert-annotated chain-of-thought (CoT) examples. By uncovering a dual mechanism between appreciation and generation tasks within conditional generative architectures, the method enables automatic transfer of calibrated knowledge from the appreciation module to the generation module. Experimental results demonstrate that the proposed framework substantially mitigates the shortcomings of current datasets and validates the practical feasibility of aligning generative models with nuanced creative quality metrics in real-world engineering settings.
📝 Abstract
This paper provides an empirical implementation of the creative quality metric proposed in Calibrated Surprise (Zou & Xu, 2026a). The question this paper addresses is: does this mathematical claim hold at the engineering level?
To make the answer as general as possible, we deliberately choose the strictest engineering conditions: low data cost and a small base model. Training data comes from approximately 100 expert chain-of-thought (CoT) annotations produced by the BC Protocol (Zou & Xu, 2026b).
We also identify a data bias: most publicly available alignment datasets are skewed toward craft-related knowledge, while audience modeling and reality-logic coverage are systematically weak.
We use the term Creative Quality Alignment (CQA) to describe this class of engineering methods. We also offer a supporting theoretical observation: in an LLM with a single conditional distribution architecture, calibrating the appreciation side automatically transfers to the generation side via architectural duality. This is the structural reason why ~100 CoT examples are sufficient -- not a purely empirical observation like LIMA (Zhou et al., 2023).