Data Dependency Inference for Industrial Code Generation Based on UML Sequence Diagrams

๐Ÿ“… 2025-08-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Ambiguous natural language specifications in industrial-scale code generation impede accurate modeling of implicit data dependencies. Method: This paper proposes UML2Depโ€”a framework that (1) extends UML sequence diagrams by integrating decision tables and API specifications to explicitly represent conditional logic and architectural constraints; (2) formulates data dependency inference as a constraint satisfaction problem, synergizing large language modelโ€“based mathematical reasoning, static program analysis, and dependency pruning to construct high-fidelity data dependency graphs; and (3) enables formalism-driven code generation. Contribution/Results: UML2Dep significantly reduces semantic ambiguity and contextual complexity, thereby improving the correctness and system reliability of generated code. Empirical evaluation in service-oriented architecture scenarios demonstrates its effectiveness and practical applicability.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) excel at generating code from natural language (NL) descriptions. However, the plain textual descriptions are inherently ambiguous and often fail to capture complex requirements like intricate system behaviors, conditional logic, and architectural constraints; implicit data dependencies in service-oriented architectures are difficult to infer and handle correctly. To bridge this gap, we propose a novel step-by-step code generation framework named UML2Dep by leveraging unambiguous formal specifications of complex requirements. First, we introduce an enhanced Unified Modeling Language (UML) sequence diagram tailored for service-oriented architectures. This diagram extends traditional visual syntax by integrating decision tables and API specifications, explicitly formalizing structural relationships and business logic flows in service interactions to rigorously eliminate linguistic ambiguity. Second, recognizing the critical role of data flow, we introduce a dedicated data dependency inference (DDI) task. DDI systematically constructs an explicit data dependency graph prior to actual code synthesis. To ensure reliability, we formalize DDI as a constrained mathematical reasoning task through novel prompting strategies, aligning with LLMs' excellent mathematical strengths. Additional static parsing and dependency pruning further reduce context complexity and cognitive load associated with intricate specifications, thereby enhancing reasoning accuracy and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Infer implicit data dependencies in service-oriented architectures
Generate code from unambiguous formal specifications instead of ambiguous NL
Enhance reasoning accuracy for complex UML sequence diagrams
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced UML sequence diagrams with decision tables
Data dependency inference via constrained mathematical reasoning
Static parsing and pruning to reduce complexity
๐Ÿ”Ž Similar Papers
No similar papers found.
W
Wenxin Mao
WeChat Pay, Tencent Inc, Shenzhen, China
Zhitao Wang
Zhitao Wang
WeChat Pay AI, Tencent; The Hong Kong Polytechnic University
LLM4SEGraph LearningInformation DiffusionNLP
L
Long Wang
WeChat Pay, Tencent Inc, Shenzhen, China
S
Sirong Chen
WeChat Pay, Tencent Inc, Shenzhen, China
C
Cuiyun Gao
The Chinese University of Hong Kong, Hong Kong, China
L
Luyang Cao
WeChat Pay, Tencent Inc, Shenzhen, China
Z
Ziming Liu
WeChat Pay, Tencent Inc, Shenzhen, China
Q
Qiming Zhang
WeChat Pay, Tencent Inc, Shenzhen, China
J
Jun Zhou
WeChat Pay, Tencent Inc, Shenzhen, China
Zhi Jin
Zhi Jin
Sun Yat-Sen University, Associate Professor