Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction

📅 2026-02-01

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the limitations of existing workflow optimization methods, which treat workflow construction as a static, one-shot code generation task and thus struggle to support dynamic multi-turn reasoning. To overcome this, the paper proposes Grouped Subsequence Policy Optimization (GSsPO), reframing workflow construction as a natural language–based, multi-turn sequential decision-making process. By defining composite subsequences—aligned with “thought-action” semantic boundaries—as the optimization units for reinforcement learning, GSsPO resolves the mismatch between single-step action granularity and the hierarchical structure of multi-turn tasks. The approach further incorporates a structure-aware policy gradient update mechanism. Extensive experiments demonstrate that GSsPO significantly outperforms current baselines across multiple question-answering benchmarks, confirming its effectiveness and generalization capability in complex multi-turn reasoning scenarios.

Technology Category

Application Category

📝 Abstract

The rapid evolution of agentic workflows has demonstrated strong performance of LLM-based agents in addressing complex reasoning tasks. However, existing workflow optimization methods typically formulate workflow synthesis as a static, one-shot code-centric generation problem. This paradigm imposes excessive constraints on the model's coding capabilities and restricts the flexibility required for dynamic problem-solving. In this paper, we present Workflow-R1, a framework that reformulates workflow construction as a multi-turn, natural language-based sequential decision-making process. To resolve the optimization granularity mismatch inherent in such multi-turn interactions, we introduce Group Sub-sequence Policy Optimization (GSsPO). While explicitly tailored to align with the interleaved Think-Action dynamics of agentic reasoning, GSsPO fundamentally functions as a structure-aware RL algorithm generalizable to a broad class of multi-turn agentic sequential decision-making tasks. By recalibrating the optimization unit to the composite sub-sequence, specifically the atomic Think-Action cycle, it aligns gradient updates with the semantic boundaries of these interactions, ensuring robust learning in complex multi-turn reasoning tasks. Through extensive experiments on multiple QA benchmarks, Workflow-R1 outperforms competitive baselines, validating GSsPO as a generalized solution for sequential reasoning and establishing Workflow-R1 as a promising new paradigm for automated workflow optimization.

Problem

Research questions and friction points this paper is trying to address.

workflow optimization

multi-turn reasoning

sequential decision-making

agentic workflows

LLM-based agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Group Sub-sequence Policy Optimization

multi-turn reasoning

agentic workflows