Automating Complex Document Workflows via Stepwise and Rollback-Enabled Operation Orchestration

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenges of step-level control and irreversible errors in automating complex document workflows. We propose Stepwise-and-Rollback, a novel execution framework featuring instruction-guided action planning, intent-filtered API candidate generation, fine-grained document state tracking, and dual-level rollback—operating at both parameter and API granularity—to enable progressive multi-step orchestration and dynamic error correction. We introduce a new document-processing benchmark comprising 250 real-world dialogues, the first to support continuous alignment between user intent and document state over long-horizon tasks. Experiments show our method achieves 90% instruction-level and 62% session-level task completion rates—improving over baselines by 40% and 76%, respectively—while demonstrating robustness across diverse LLMs and task difficulty levels.

Technology Category

Application Category

📝 Abstract
Workflow automation promises substantial productivity gains in everyday document-related tasks. While prior agentic systems can execute isolated instructions, they struggle with automating multi-step, session-level workflows due to limited control over the operational process. To this end, we introduce AutoDW, a novel execution framework that enables stepwise, rollback-enabled operation orchestration. AutoDW incrementally plans API actions conditioned on user instructions, intent-filtered API candidates, and the evolving states of the document. It further employs robust rollback mechanisms at both the argument and API levels, enabling dynamic correction and fault tolerance. These designs together ensure that the execution trajectory of AutoDW remains aligned with user intent and document context across long-horizon workflows. To assess its effectiveness, we construct a comprehensive benchmark of 250 sessions and 1,708 human-annotated instructions, reflecting realistic document processing scenarios with interdependent instructions. AutoDW achieves 90% and 62% completion rates on instruction- and session-level tasks, respectively, outperforming strong baselines by 40% and 76%. Moreover, AutoDW also remains robust for the decision of backbone LLMs and on tasks with varying difficulty. Code and data will be open-sourced. Code: https://github.com/YJett/AutoDW
Problem

Research questions and friction points this paper is trying to address.

Automates multi-step document workflows with stepwise orchestration
Enables dynamic correction via rollback mechanisms for fault tolerance
Improves completion rates in realistic document processing scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stepwise API action planning with document state
Robust rollback mechanisms at argument and API levels
Dynamic correction and fault tolerance in workflows
🔎 Similar Papers
No similar papers found.
Y
Yanbin Zhang
East China Normal University
H
Hanhui Ye
East China Normal University
Yue Bai
Yue Bai
Northwestern University, Northeastern University
Multi-modal learningSparse network trainingMask learning
Q
Qiming Zhang
East China Normal University
L
Liao Xiang
East China Normal University
W
Wu Mianzhi
East China Normal University
Renjun Hu
Renjun Hu
East China Normal University
Robust ML/AILLMsgraph mining