MPL: Multiple Programming Languages with Large Language Models for Information Extraction

πŸ“… 2025-05-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing information extraction (IE) research predominantly relies on Python-style code simulation, overlooking the synergistic potential of multi-programming-language supervision in supervised fine-tuning (SFT). Method: This paper pioneers a systematic investigation into leveraging C++, Java, and Python as structured output guidance signals. We propose a function-prompt construction strategy and a virtual execution mechanism to enhance template modeling efficiency and cross-lingual generalization. Our approach integrates multi-programming-language (PL) template design, large language model (LLM) SFT, and lightweight runtime simulation. Contribution/Results: Evaluated on multiple standard IE benchmarks, our method significantly outperforms single-language (Python-only) baselines, demonstrating both effectiveness and robustness of the multi-PL strategy. All code is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Recent research in information extraction (IE) focuses on utilizing code-style inputs to enhance structured output generation. The intuition behind this is that the programming languages (PLs) inherently exhibit greater structural organization than natural languages (NLs). This structural advantage makes PLs particularly suited for IE tasks. Nevertheless, existing research primarily focuses on Python for code-style simulation, overlooking the potential of other widely-used PLs (e.g., C++ and Java) during the supervised fine-tuning (SFT) phase. In this research, we propose extbf{M}ultiple extbf{P}rogramming extbf{L}anguages with large language models for information extraction (abbreviated as extbf{MPL}), a novel framework that explores the potential of incorporating different PLs in the SFT phase. Additionally, we introduce exttt{function-prompt} with virtual running to simulate code-style inputs more effectively and efficiently. Experimental results on a wide range of datasets demonstrate the effectiveness of MPL. Furthermore, we conduct extensive experiments to provide a comprehensive analysis. We have released our code for future research.
Problem

Research questions and friction points this paper is trying to address.

Exploring multiple programming languages for structured information extraction
Enhancing code-style inputs with function-prompt and virtual running
Improving supervised fine-tuning with diverse PLs beyond Python
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multiple programming languages for structured output
Introduces function-prompt with virtual running
Enhances information extraction via supervised fine-tuning
πŸ”Ž Similar Papers
No similar papers found.
B
Bo Li
State Key Laboratory of Intelligent Power Distribution Equipment and System, School of Health Sciences and Biomedical Engineering, Hebei University of Technology
G
Gexiang Fang
National Engineering Research Center for Software Engineering, Peking University
W
Wei Ye
National Engineering Research Center for Software Engineering, Peking University
Z
Zhenghua Xu
State Key Laboratory of Intelligent Power Distribution Equipment and System, School of Health Sciences and Biomedical Engineering, Hebei University of Technology
Jinglei Zhang
Jinglei Zhang
Shanghai Jiao Tong University
computer vision
H
Hao Cheng
School of Artificial Intelligence, Hebei University of Technology
Shikun Zhang
Shikun Zhang
εŒ—δΊ¬ε€§ε­¦