Leveraging LLMs for Automated Translation of Legacy Code: A Case Study on PL/SQL to Java Transformation

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This study addresses the challenge of modernizing large-scale PL/SQL legacy systems (~2.5 million lines) to Java amid severe documentation and test-case deficits. We propose a novel LLM-based code translation method integrating chain-of-guidance reasoning with domain-adapted n-shot prompting, specifically tailored to database logic and transactional semantics. Leveraging a lightweight, manually curated dataset—comprising 10 PL/SQL–Java parallel pairs and 15 representative Java classes—we design semantically grounded prompt templates. Experimental evaluation on the VT→VTF3 migration task demonstrates substantial improvements in syntactic correctness and functional fidelity of generated Java code, yielding production-ready implementations. Our approach establishes a reproducible, scalable paradigm for few-shot, high-complexity legacy system refactoring, advancing automated code migration in enterprise database-centric environments.

Technology Category

Application Category

📝 Abstract

The VT legacy system, comprising approximately 2.5 million lines of PL/SQL code, lacks consistent documentation and automated tests, posing significant challenges for refactoring and modernisation. This study investigates the feasibility of leveraging large language models (LLMs) to assist in translating PL/SQL code into Java for the modernised "VTF3" system. By leveraging a dataset comprising 10 PL/SQL-to-Java code pairs and 15 Java classes, which collectively established a domain model for the translated files, multiple LLMs were evaluated. Furthermore, we propose a customized prompting strategy that integrates chain-of-guidance reasoning with $n$-shot prompting. Our findings indicate that this methodology effectively guides LLMs in generating syntactically accurate translations while also achieving functional correctness. However, the findings are limited by the small sample size of available code files and the restricted access to test cases used for validating the correctness of the generated code. Nevertheless, these findings lay the groundwork for scalable, automated solutions in modernising large legacy systems.

Problem

Research questions and friction points this paper is trying to address.

Automating translation of undocumented legacy PL/SQL to Java

Overcoming refactoring challenges in large undocumented codebases

Ensuring functional correctness in automated code transformation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs for automated code translation

Customized prompting strategy with chain-of-guidance

Evaluating multiple LLMs for PL/SQL to Java

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks