PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the limited generalization and process consistency of large language models (LLMs) in workflow-constrained dialogue tasks (e.g., customer support, equipment maintenance), this paper proposes a structured instruction-tuning method grounded in UML activity diagrams. We pioneer the automatic parsing of PlantUML activity diagrams into atomic five-tuple dialogue units and construct PFDial—a high-quality Chinese dataset comprising 12,705 samples. We systematically characterize the impact of decision branches, sequential branches, and backward jumps on model performance—revealing their distinct influence mechanisms for the first time. With only 800 fine-tuning samples, a 7B-parameter model achieves >90% accuracy; full training of a 0.5B model also exceeds 90%. An 8B model averages 11.00% absolute accuracy gain over GPT-4o, with peak improvements reaching 43.88%. The PFDial dataset and code are publicly released.

Technology Category

Application Category

📝 Abstract

Process-driven dialogue systems, which operate under strict predefined process constraints, are essential in customer service and equipment maintenance scenarios. Although Large Language Models (LLMs) have shown remarkable progress in dialogue and reasoning, they still struggle to solve these strictly constrained dialogue tasks. To address this challenge, we construct Process Flow Dialogue (PFDial) dataset, which contains 12,705 high-quality Chinese dialogue instructions derived from 440 flowcharts containing 5,055 process nodes. Based on PlantUML specification, each UML flowchart is converted into atomic dialogue units i.e., structured five-tuples. Experimental results demonstrate that a 7B model trained with merely 800 samples, and a 0.5B model trained on total data both can surpass 90% accuracy. Additionally, the 8B model can surpass GPT-4o up to 43.88% with an average of 11.00%. We further evaluate models' performance on challenging backward transitions in process flows and conduct an in-depth analysis of various dataset formats to reveal their impact on model performance in handling decision and sequential branches. The data is released in https://github.com/KongLongGeFDU/PFDial.

Problem

Research questions and friction points this paper is trying to address.

Enhance process-driven dialogue systems under strict constraints.

Address LLMs' limitations in strictly constrained dialogue tasks.

Evaluate model performance on backward transitions and dataset formats.

Innovation

Methods, ideas, or system contributions that make the work stand out.

UML flowcharts converted to structured dialogue units

Process Flow Dialogue dataset with 12,705 instructions

Small models achieve high accuracy with limited data

🔎 Similar Papers

No similar papers found.