Precisely Detecting Python Type Errors via LLM-based Unit Test Generation

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

124K/year

🤖 AI Summary

Type errors in Python frequently cause runtime failures, yet existing static analyzers suffer from high false-positive rates, and generic unit test generation struggles to effectively trigger such errors. This paper proposes RTED, a novel approach that integrates type-constraint analysis, large language model (LLM)-driven type-aware test generation, and reflective dynamic validation to achieve precise type-error detection. Its key innovations include (1) guiding test generation via iterative constraint solving over inferred type constraints, and (2) leveraging reflection-based execution to verify behavioral consistency across type-annotated and unannotated code paths. Evaluated on the BugsInPy and TypeBugs benchmarks, RTED detects 22–29 additional type errors over state-of-the-art tools, achieving precision improvements of 173.9%–245.9%. Furthermore, RTED identifies previously unknown type errors in 12 real-world Python projects.

Technology Category

Application Category

📝 Abstract

Type errors in Python often lead to runtime failures, posing significant challenges to software reliability and developer productivity. Existing static analysis tools aim to detect such errors without execution but frequently suffer from high false positive rates. Recently, unit test generation techniques offer great promise in achieving high test coverage, but they often struggle to produce bug-revealing tests without tailored guidance. To address these limitations, we present RTED, a novel type-aware test generation technique for automatically detecting Python type errors. Specifically, RTED combines step-by-step type constraint analysis with reflective validation to guide the test generation process and effectively suppress false positives. We evaluated RTED on two widely-used benchmarks, BugsInPy and TypeBugs. Experimental results show that RTED can detect 22-29 more benchmarked type errors than four state-of-the-art techniques. RTED is also capable of producing fewer false positives, achieving an improvement of 173.9%-245.9% in precision. Furthermore, RTED successfully discovered 12 previously unknown type errors from six real-world open-source Python projects.

Problem

Research questions and friction points this paper is trying to address.

Detect Python type errors with high accuracy

Reduce false positives in type error detection

Generate effective tests for bug revelation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based unit test generation for Python

Step-by-step type constraint analysis

Reflective validation to reduce false positives

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation