MPL: Multiple Programming Languages with Large Language Models for Information Extraction

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing information extraction (IE) research predominantly relies on Python-style code simulation, overlooking the synergistic potential of multi-programming-language supervision in supervised fine-tuning (SFT). Method: This paper pioneers a systematic investigation into leveraging C++, Java, and Python as structured output guidance signals. We propose a function-prompt construction strategy and a virtual execution mechanism to enhance template modeling efficiency and cross-lingual generalization. Our approach integrates multi-programming-language (PL) template design, large language model (LLM) SFT, and lightweight runtime simulation. Contribution/Results: Evaluated on multiple standard IE benchmarks, our method significantly outperforms single-language (Python-only) baselines, demonstrating both effectiveness and robustness of the multi-PL strategy. All code is publicly available.

Technology Category

Application Category

📝 Abstract

Recent research in information extraction (IE) focuses on utilizing code-style inputs to enhance structured output generation. The intuition behind this is that the programming languages (PLs) inherently exhibit greater structural organization than natural languages (NLs). This structural advantage makes PLs particularly suited for IE tasks. Nevertheless, existing research primarily focuses on Python for code-style simulation, overlooking the potential of other widely-used PLs (e.g., C++ and Java) during the supervised fine-tuning (SFT) phase. In this research, we propose extbf{M}ultiple extbf{P}rogramming extbf{L}anguages with large language models for information extraction (abbreviated as extbf{MPL}), a novel framework that explores the potential of incorporating different PLs in the SFT phase. Additionally, we introduce exttt{function-prompt} with virtual running to simulate code-style inputs more effectively and efficiently. Experimental results on a wide range of datasets demonstrate the effectiveness of MPL. Furthermore, we conduct extensive experiments to provide a comprehensive analysis. We have released our code for future research.

Problem

Research questions and friction points this paper is trying to address.

Exploring multiple programming languages for structured information extraction

Enhancing code-style inputs with function-prompt and virtual running

Improving supervised fine-tuning with diverse PLs beyond Python

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multiple programming languages for structured output

Introduces function-prompt with virtual running

Enhances information extraction via supervised fine-tuning

🔎 Similar Papers

No similar papers found.