Towards Multi-Platform Mutation Testing of Task-based Chatbots

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current testing of task-oriented chatbots across platforms (e.g., Dialogflow, Rasa) suffers from insufficient coverage and limited ability to expose deep dialogue-logic faults. To address this, we propose MUTABOT—the first dialogue-level mutation testing framework for cross-platform task-oriented bots. Its core innovation is a platform-agnostic dialogue fault model, extending classical mutation testing to the dialogue-flow level for the first time. MUTABOT applies semantics-aware mutation operators—including intent substitution, slot manipulation, and anomalous state transitions—on test suites generated by Botium to inject realistic faults. Evaluation demonstrates that MUTABOT significantly improves fault detection rates and dialogue-path coverage, uncovering critical scenarios missed by Botium. Moreover, it provides a quantifiable robustness assessment framework for task-oriented chatbots, enabling systematic evaluation of dialogue resilience across diverse platforms.

Technology Category

Application Category

📝 Abstract
Chatbots, also known as conversational agents, have become ubiquitous, offering services for a multitude of domains. Unlike general-purpose chatbots, task-based chatbots are software designed to prioritize the completion of tasks of the domain they handle (e.g., flight booking). Given the growing popularity of chatbots, testing techniques that can generate full conversations as test cases have emerged. Still, thoroughly testing all the possible conversational scenarios implemented by a task-based chatbot is challenging, resulting in incorrect behaviors that may remain unnoticed. To address this challenge, we proposed MUTABOT, a mutation testing approach for injecting faults in conversations and producing faulty chatbots that emulate defects that may affect the conversational aspects. In this paper, we present our extension of MUTABOT to multiple platforms (Dialogflow and Rasa), and present experiments that show how mutation testing can be used to reveal weaknesses in test suites generated by the Botium state-of-the-art test generator.
Problem

Research questions and friction points this paper is trying to address.

Testing task-based chatbots for conversational defects
Generating comprehensive test cases for chatbots
Extending mutation testing to multiple chatbot platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutation testing approach for chatbots
Extending MUTABOT to multiple platforms
Injecting faults to reveal test weaknesses
🔎 Similar Papers
No similar papers found.