Constraint-Guided Unit Test Generation for Machine Learning Libraries

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

135K/year

🤖 AI Summary

Existing automated testing tools (e.g., Pynguin) fail to recognize complex input constraints of ML library APIs (e.g., PyTorch/TensorFlow), leading to frequent early test failures and limited coverage. To address this, we propose a constraint-guided unit test generation method that jointly leverages static analysis and natural language processing to automatically extract structured constraints—including tensor dimensions, data types, and value ranges—from official documentation. We extend Pynguin to support constraint-aware test input generation. Evaluated on 165 mainstream ML modules, our approach improves code coverage by up to 63.9% over Pynguin, significantly enhancing test effectiveness and input validity. This work represents the first systematic modeling and test-driven utilization of semantic constraints inherent in deep learning library APIs.

Technology Category

Application Category

📝 Abstract

Machine learning (ML) libraries such as PyTorch and TensorFlow are essential for a wide range of modern applications. Ensuring the correctness of ML libraries through testing is crucial. However, ML APIs often impose strict input constraints involving complex data structures such as tensors. Automated test generation tools such as Pynguin are not aware of these constraints and often create non-compliant inputs. This leads to early test failures and limited code coverage. Prior work has investigated extracting constraints from official API documentation. In this paper, we present PynguinML, an approach that improves the Pynguin test generator to leverage these constraints to generate compliant inputs for ML APIs, enabling more thorough testing and higher code coverage. Our evaluation is based on 165 modules from PyTorch and TensorFlow, comparing PynguinML against Pynguin. The results show that PynguinML significantly improves test effectiveness, achieving up to 63.9 % higher code coverage.

Problem

Research questions and friction points this paper is trying to address.

Generating compliant inputs for ML library APIs

Improving test effectiveness and code coverage

Addressing constraints in automated test generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts constraints from API documentation

Generates compliant inputs for ML libraries

Improves code coverage in automated testing

🔎 Similar Papers

Retrieval-Augmented Test Generation: How Far Are We?