Names Are All You Need: Effective and Safe Regression Test Selection for Python

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

136K/year

🤖 AI Summary

This work addresses the challenges posed by Python’s dynamic typing to regression test selection (RTS), which often results in imprecise call graphs or overly conservative file-level analyses. The paper proposes NameRTS, the first name-based, fine-grained RTS approach for Python. It models programs as bipartite graphs of code elements and name nodes, leveraging graph reachability to identify affected tests without constructing call graphs. A context-aware name-matching pruning strategy further enhances precision. The contributions include this novel method and the first Python RTS dataset annotated with real impact labels. Experimental results demonstrate that NameRTS skips 69.90% of test files on average—146.5% more than BabelRTS—reduces end-to-end testing time by 45.59%, and achieves a safety rate of 99.6%.

📝 Abstract

Regression test selection reduces the cost of regression testing by executing only those tests affected by a code change. Despite extensive study of RTS in statically typed languages, achieving effective and safe RTS in Python is challenging. Python's dynamic typing makes precise call-graph construction difficult, which can cause call-graph-based RTS to miss affected tests. Python's eager importing mechanism, in contrast, renders file-level dependency analysis overly conservative. This paper presents NameRTS, the first Python RTS approach based on fine-grained dependency analysis. NameRTS models a Python program as a bipartite graph of code element nodes and name nodes, with edges capturing definitions and references. RTS is formulated as a reachability problem on this graph: a test is selected if any modified code element is reachable from the names used in that test. This design avoids call-graph construction, enabling a conservative analysis amenable to safety. To control dependency cascades introduced by coarse name matching, NameRTS applies two pruning strategies that leverage prior test executions and context information to refine name matching. To evaluate NameRTS, we construct the first Python RTS dataset with a ground truth indicating which test files are affected by each commit. We compare NameRTS with the best-performing baseline, BabelRTS, an RTS technique based on coarse file-level dependencies. On this benchmark, NameRTS skips 69.90% of test files on average, outperforming BabelRTS by 146.5%. It also reduces end-to-end testing time by 45.59%, yielding a 107.7% improvement over BabelRTS. In terms of safety, NameRTS selects all affected tests for 99.6% of commits, with only rare misses in exceptional cases. In contrast, BabelRTS is safe for 76.6% of commits. These results demonstrate the effectiveness of NameRTS, paving the way for more efficient regression testing in Python.

Problem

Research questions and friction points this paper is trying to address.

Regression Test Selection

Python

Dynamic Typing

Dependency Analysis

Test Safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Regression Test Selection

Python

Fine-grained Dependency Analysis