PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited generalization and process consistency of large language models (LLMs) in workflow-constrained dialogue tasks (e.g., customer support, equipment maintenance), this paper proposes a structured instruction-tuning method grounded in UML activity diagrams. We pioneer the automatic parsing of PlantUML activity diagrams into atomic five-tuple dialogue units and construct PFDial—a high-quality Chinese dataset comprising 12,705 samples. We systematically characterize the impact of decision branches, sequential branches, and backward jumps on model performance—revealing their distinct influence mechanisms for the first time. With only 800 fine-tuning samples, a 7B-parameter model achieves >90% accuracy; full training of a 0.5B model also exceeds 90%. An 8B model averages 11.00% absolute accuracy gain over GPT-4o, with peak improvements reaching 43.88%. The PFDial dataset and code are publicly released.

Technology Category

Application Category

📝 Abstract
Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial) dataset, which contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each UML flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy. Additionally, the 8B model can surpass GPT-4o up to 43.88% with an average of 11.00%. We further evaluate models' performance on challenging backward transitions in process flows and conduct an in-depth analysis of various dataset formats to reveal their impact on model performance in handling decision and sequential branches. The data is released in https://github.com/KongLongGeFDU/PFDial.
Problem

Research questions and friction points this paper is trying to address.

Enhance process-driven dialogue systems under strict constraints.
Address LLMs' limitations in strictly constrained dialogue tasks.
Evaluate model performance on backward transitions and dataset formats.
Innovation

Methods, ideas, or system contributions that make the work stand out.

UML flowcharts converted to structured dialogue units
Process Flow Dialogue dataset with 12,705 instructions
Small models achieve high accuracy with limited data
🔎 Similar Papers
No similar papers found.
M
Ming Zhang
School of Computer Science, Fudan University
Y
Yuhui Wang
School of Computer Science, Fudan University
Y
Yujiong Shen
School of Computer Science, Fudan University
T
Tingyi Yang
School of Computer Science, Fudan University
C
Changhao Jiang
School of Computer Science, Fudan University
Yilong Wu
Yilong Wu
Fudan University
Natural Language Processing
Shihan Dou
Shihan Dou
Fudan University
LLMsCode LMsRLAlignment
Q
Qinhao Chen
School of Computer Science, Fudan University, Graduate School of Arts and Sciences, Columbia University
Zhiheng Xi
Zhiheng Xi
Fudan University
LLM ReasoningLLM-based Agents
Z
Zhihao Zhang
School of Computer Science, Fudan University
Y
Yi Dong
School of Computer Science, Fudan University
Z
Zhen Wang
Douyin Co., Ltd.
Z
Zhihui Fei
Douyin Co., Ltd.
M
Mingyang Wan
Douyin Co., Ltd.
T
Tao Liang
Douyin Co., Ltd.
G
Guojun Ma
Douyin Co., Ltd.
Q
Qi Zhang
School of Computer Science, Fudan University
T
Tao Gui
School of Computer Science, Fudan University, Institute of Modern Languages and Linguistics, Fudan University
X
Xuanjing Huang
School of Computer Science, Fudan University