Towards Multi-Platform Mutation Testing of Task-based Chatbots

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Current testing of task-oriented chatbots across platforms (e.g., Dialogflow, Rasa) suffers from insufficient coverage and limited ability to expose deep dialogue-logic faults. To address this, we propose MUTABOT—the first dialogue-level mutation testing framework for cross-platform task-oriented bots. Its core innovation is a platform-agnostic dialogue fault model, extending classical mutation testing to the dialogue-flow level for the first time. MUTABOT applies semantics-aware mutation operators—including intent substitution, slot manipulation, and anomalous state transitions—on test suites generated by Botium to inject realistic faults. Evaluation demonstrates that MUTABOT significantly improves fault detection rates and dialogue-path coverage, uncovering critical scenarios missed by Botium. Moreover, it provides a quantifiable robustness assessment framework for task-oriented chatbots, enabling systematic evaluation of dialogue resilience across diverse platforms.

Technology Category

Application Category

📝 Abstract

Chatbots, also known as conversational agents, have become ubiquitous, offering services for a multitude of domains. Unlike general-purpose chatbots, task-based chatbots are software designed to prioritize the completion of tasks of the domain they handle (e.g., flight booking). Given the growing popularity of chatbots, testing techniques that can generate full conversations as test cases have emerged. Still, thoroughly testing all the possible conversational scenarios implemented by a task-based chatbot is challenging, resulting in incorrect behaviors that may remain unnoticed. To address this challenge, we proposed MUTABOT, a mutation testing approach for injecting faults in conversations and producing faulty chatbots that emulate defects that may affect the conversational aspects. In this paper, we present our extension of MUTABOT to multiple platforms (Dialogflow and Rasa), and present experiments that show how mutation testing can be used to reveal weaknesses in test suites generated by the Botium state-of-the-art test generator.

Problem

Research questions and friction points this paper is trying to address.

Testing task-based chatbots for conversational defects

Generating comprehensive test cases for chatbots

Extending mutation testing to multiple chatbot platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutation testing approach for chatbots

Extending MUTABOT to multiple platforms

Injecting faults to reveal test weaknesses

🔎 Similar Papers

An Exploratory Study on Using Large Language Models for Mutation Testing