Exploring the Potential of Large Language Models in Simulink-Stateflow Mutant Generation

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the limitations of traditional mutation testing in Simulink-Stateflow models, which often produces redundant, equivalent, or inexecutable mutants, thereby hindering effective assessment of test adequacy for safety-critical cyber-physical systems. To overcome this challenge, the work introduces large language models (LLMs) into the domain for the first time, proposing an automated mutation generation pipeline that converts models into structured JSON representations and leverages few-shot prompting with temperature control strategies. Evaluated on four industrial-scale models, the approach generated 38,400 mutants—13 times faster than the baseline—while substantially reducing the proportion of equivalent and duplicate mutants. The resulting mutation quality consistently surpasses that of manually designed methods, demonstrating both enhanced efficiency and effectiveness in mutation-based testing.

Technology Category

Application Category

📝 Abstract

Mutation analysis is a powerful technique for assessing test-suite adequacy, yet conventional approaches suffer from generating redundant, equivalent, or non-executable mutants. These challenges are particularly amplified in Simulink-Stateflow models due to the hierarchical structure these models have, which integrate continuous dynamics with discrete-event behaviors and are widely deployed in safety-critical Cyber-Physical Systems (CPSs). While prior work has explored machine learning and manually engineered mutation operators, these approaches remain constrained by limited training data and scalability issues. Motivated by recent advances in Large Language Models (LLMs), we investigate their potential to generate high-quality, domain-specific mutants for Simulink-Stateflow models. We develop an automated pipeline that converts Simulink-Stateflow models to structured JSON representations and systematically evaluates different mutation and prompting strategies across eight state-of-the-art LLMs. Through a comprehensive empirical study involving 38,400 LLM-generated mutants across four Simulink-Stateflow models, we demonstrate that LLMs generate mutants up to 13x faster than a manually engineered mutation-based baseline while producing significantly fewer equivalent and duplicate mutants and consistently achieving superior mutant quality. Moreover, our analysis reveals that few-shot prompting combined with low-to-medium temperature values yields optimal results. We provide an open-source prototype tool and release our complete dataset to facilitate reproducibility and advance future research in this domain.

Problem

Research questions and friction points this paper is trying to address.

mutation analysis

Simulink-Stateflow

equivalent mutants

redundant mutants

Cyber-Physical Systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Mutation Testing

Simulink-Stateflow

Prompt Engineering

Cyber-Physical Systems

🔎 Similar Papers

No similar papers found.

Authors to Follow