Imitation Game: Reproducing Deep Learning Bugs Leveraging an Intelligent Agent

πŸ“… 2025-12-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Deep learning (DL) bug reproduction remains challenging due to model non-determinism and tight coupling with execution environments; existing approaches achieve stable reproduction in only 3% of cases. This paper introduces RepGen, the first automated DL bug reproduction framework integrating large language models (LLMs). Its core contributions are: (1) a learning-enhanced contextual modeling module that precisely captures DL runtime environment configurations and behavioral semantics; (2) a generate-validate-refine iterative mechanism enabling high-fidelity code patching and reproduction; and (3) deep LLM integration into the closed-loop reproduction pipeline, supporting semantic-aware code generation and validation. Evaluated on 106 real-world DL bugs, RepGen achieves an 80.19% reproduction rateβ€”19.81 percentage points higher than the state-of-the-art. Developer studies show a 23.35% increase in reproduction success rate, a 56.8% reduction in time cost, and significantly lowered cognitive load.

Technology Category

Application Category

πŸ“ Abstract
Despite their wide adoption in various domains (e.g., healthcare, finance, software engineering), Deep Learning (DL)-based applications suffer from many bugs, failures, and vulnerabilities. Reproducing these bugs is essential for their resolution, but it is extremely challenging due to the inherent nondeterminism of DL models and their tight coupling with hardware and software environments. According to recent studies, only about 3% of DL bugs can be reliably reproduced using manual approaches. To address these challenges, we present RepGen, a novel, automated, and intelligent approach for reproducing deep learning bugs. RepGen constructs a learning-enhanced context from a project, develops a comprehensive plan for bug reproduction, employs an iterative generate-validate-refine mechanism, and thus generates such code using an LLM that reproduces the bug at hand. We evaluate RepGen on 106 real-world deep learning bugs and achieve a reproduction rate of 80.19%, a 19.81% improvement over the state-of-the-art measure. A developer study involving 27 participants shows that RepGen improves the success rate of DL bug reproduction by 23.35%, reduces the time to reproduce by 56.8%, and lowers participants' cognitive load.
Problem

Research questions and friction points this paper is trying to address.

Automated reproduction of deep learning bugs
Addresses nondeterminism and environment coupling challenges
Improves bug reproduction rates and reduces time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM to generate bug-reproducing code
Employs iterative generate-validate-refine mechanism
Constructs learning-enhanced context from project
πŸ”Ž Similar Papers
No similar papers found.