🤖 AI Summary
Existing large language models exhibit coarse-grained stylistic control and unreliable evaluation in open-ended story generation. Method: We propose a style-conditioned training framework integrating fine-grained style modeling with a multi-objective reward mechanism: a style reward derived from authorship verification signals, jointly optimized with content coherence and narrative completeness scores via Group Relative Policy Optimization (GRPO) on an 8B-parameter model; additionally, a fine-tuned sentence transformer serves as a style discriminator to enable end-to-end style alignment. Contribution/Results: On Mark Twain–style story generation, our method achieves a style score of 0.628—significantly outperforming GPT-4o and Claude Sonnet 4—demonstrating superior stylistic consistency and competitive content quality. To our knowledge, this is the first work to achieve stylistic alignment capability surpassing that of larger models using only a medium-scale model.
📝 Abstract
Recent advances in large language models (LLMs) show impressive performance in open-ended story generation, but fine-grained stylistic control remains limited. Existing methods often rely on shallow cues (e.g., names or topics) to simulate authorial style, without robust evaluation. In this work, we present a training framework for style-conditioned story generation using Group Relative Policy Optimization (GRPO) and a custom multi-reward setup. The style reward is derived from a fine-tuned sentence transformer using authorship verification (AV) signals, combined with content and completeness scores to stabilize long-form narrative generation. We conduct experiments using fiction by Mark Twain, a prominent 19th-century American author, with The Adventures of Huckleberry Finn serving as the reference style exemplar. Our 8B model outperforms larger baselines such as GPT-4o and Claude Sonnet 4 in AV-style metrics, achieving a style score of 0.628 and competitive content quality. Results demonstrate the feasibility of agentic stylistic generation with moderate model size and task-specific training. While the output is clearly style-aligned, narrative completeness remains a challenge, indicating future work is needed to better model global coherence and story resolution.