Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing large language models exhibit coarse-grained stylistic control and unreliable evaluation in open-ended story generation. Method: We propose a style-conditioned training framework integrating fine-grained style modeling with a multi-objective reward mechanism: a style reward derived from authorship verification signals, jointly optimized with content coherence and narrative completeness scores via Group Relative Policy Optimization (GRPO) on an 8B-parameter model; additionally, a fine-tuned sentence transformer serves as a style discriminator to enable end-to-end style alignment. Contribution/Results: On Mark Twain–style story generation, our method achieves a style score of 0.628—significantly outperforming GPT-4o and Claude Sonnet 4—demonstrating superior stylistic consistency and competitive content quality. To our knowledge, this is the first work to achieve stylistic alignment capability surpassing that of larger models using only a medium-scale model.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) show impressive performance in open-ended story generation, but fine-grained stylistic control remains limited. Existing methods often rely on shallow cues (e.g., names or topics) to simulate authorial style, without robust evaluation. In this work, we present a training framework for style-conditioned story generation using Group Relative Policy Optimization (GRPO) and a custom multi-reward setup. The style reward is derived from a fine-tuned sentence transformer using authorship verification (AV) signals, combined with content and completeness scores to stabilize long-form narrative generation. We conduct experiments using fiction by Mark Twain, a prominent 19th-century American author, with The Adventures of Huckleberry Finn serving as the reference style exemplar. Our 8B model outperforms larger baselines such as GPT-4o and Claude Sonnet 4 in AV-style metrics, achieving a style score of 0.628 and competitive content quality. Results demonstrate the feasibility of agentic stylistic generation with moderate model size and task-specific training. While the output is clearly style-aligned, narrative completeness remains a challenge, indicating future work is needed to better model global coherence and story resolution.

Problem

Research questions and friction points this paper is trying to address.

Achieving fine-grained authorial style control in long-form story generation

Overcoming shallow stylistic cues through robust multi-reward optimization framework

Balancing style alignment with narrative completeness in extended storytelling

Innovation

Methods, ideas, or system contributions that make the work stand out.

GRPO fine-tuning for style-conditioned story generation

Multi-reward setup with authorship verification style scoring

Moderate-sized model outperforms larger baselines in style metrics

🔎 Similar Papers

Agents' Room: Narrative Generation through Multi-step Collaboration

2024-10-03arXiv.orgCitations: 4

Qualcomm

$104,000.00 - $156,000.00

San Diego, California, United States of America

Authors to Follow