Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the limited capability of large language models (LLMs) in generating personalized long-form text, this paper proposes a reasoning-path-guided self-training framework. The method integrates Expectation-Maximization (EM) with reinforcement learning to explicitly model user historical preferences, domain-specific background knowledge, and individual writing style, thereby enabling context-aware long-text generation. Its core innovation lies in introducing an interpretable reasoning-path generation mechanism and designing an EM-enhanced Reinforcement Self-Training (EM-RST) algorithm, which jointly optimizes logical coherence of reasoning and generation quality. Evaluated on the LongLaMP benchmark, the proposed approach achieves an average relative performance improvement of 14.5% over state-of-the-art methods, demonstrating substantial gains in both fidelity and personalization. This work establishes a novel paradigm for long-form personalized text generation grounded in structured, interpretable reasoning.

Technology Category

Application Category

📝 Abstract

Personalized text generation requires a unique ability of large language models (LLMs) to learn from context that they often do not encounter during their standard training. One way to encourage LLMs to better use personalized context for generating outputs that better align with the user's expectations is to instruct them to reason over the user's past preferences, background knowledge, or writing style. To achieve this, we propose Reasoning-Enhanced Self-Training for Personalized Text Generation (REST-PG), a framework that trains LLMs to reason over personal data during response generation. REST-PG first generates reasoning paths to train the LLM's reasoning abilities and then employs Expectation-Maximization Reinforced Self-Training to iteratively train the LLM based on its own high-reward outputs. We evaluate REST-PG on the LongLaMP benchmark, consisting of four diverse personalized long-form text generation tasks. Our experiments demonstrate that REST-PG achieves significant improvements over state-of-the-art baselines, with an average relative performance gain of 14.5% on the benchmark.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Personalized Text Generation

Style Transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

REST-PG

Personalized Text Generation

Enhanced Performance

🔎 Similar Papers

No similar papers found.