π€ AI Summary
This work addresses the limitations of existing test smell detection approaches that rely on counting method invocations, which often fail to accurately identify tests verifying multiple behaviors, leading to poor readability and high coupling. To overcome this, the authors propose a novel runtime path analysis technique that introduces the concept of βTest Obsessed by Method,β replacing invocation counts with control-flow path coverage to more precisely characterize test behavior granularity. Evaluating their approach on 12 test suites from the Python standard library, they dynamically identified 44 problematic tests, each decomposable on average into 2.7 independent tests. Notably, 23% of these already contained comments indicating mixed validation of multiple behaviors, corroborating the effectiveness and practical utility of the proposed method.
π Abstract
Best testing practices state that tests should verify a single functionality or behavior of the system. Tests that verify multiple behaviors are harder to understand, lack focus, and are more coupled to the production code. An attempt to identify this issue is the test smell \emph{Eager Test}, which aims to capture tests that verify too much functionality based on the number of production method calls. Unfortunately, prior research suggests that counting production method calls is an inaccurate measure, as these calls do not reliably serve as a proxy for functionality. We envision a complementary solution based on runtime analysis: we hypothesize that some tests that verify multiple behaviors will likely cover multiple paths of the same production methods. Thus, we propose a novel test smell named \emph{Test Obsessed by Method}, a test method that covers multiple paths of a single production method. We provide an initial empirical study to explore the presence of this smell in 2,054 tests provided by 12 test suites of the Python Standard Library. (1) We detect 44 \emph{Tests Obsessed by Methods} in 11 of the 12 test suites. (2) Each smelly test verifies a median of two behaviors of the production method. (3) The 44 smelly tests could be split into 118 novel tests. (4) 23% of the smelly tests have code comments recognizing that distinct behaviors are being tested. We conclude by discussing benefits, limitations, and further research.