🤖 AI Summary
To address insufficient coverage of edge cases and limited textual diversity in AI model safety evaluation, this paper proposes Amplio, a human-in-the-loop data augmentation framework that systematically identifies unexplored semantic “blank spaces” to uncover latent risk scenarios. Methodologically, Amplio introduces three novel collaborative augmentation techniques: concept-guided augmentation, semantic interpolation-based exploration, and LLM-driven augmentation—integrating red-teaming experts’ domain intuition into the generative feedback loop via interactive visualization, semantic space modeling, controllable text interpolation, and integrated prompt engineering with iterative feedback. In a user study involving 18 professional red-team practitioners, Amplio significantly improved the quality, diversity, and relevance of safety prompts, achieving a 3.2× average increase in augmentation efficiency. The framework establishes a new paradigm for robust, interpretable, and expert-augmented AI safety assessment.
📝 Abstract
Data augmentation is crucial to make machine learning models more robust and safe. However, augmenting data can be challenging as it requires generating diverse data points to rigorously evaluate model behavior on edge cases and mitigate potential harms. Creating high-quality augmentations that cover these"unknown unknowns"is a time- and creativity-intensive task. In this work, we introduce Amplio, an interactive tool to help practitioners navigate"unknown unknowns"in unstructured text datasets and improve data diversity by systematically identifying empty data spaces to explore. Amplio includes three human-in-the-loop data augmentation techniques: Augment With Concepts, Augment by Interpolation, and Augment with Large Language Model. In a user study with 18 professional red teamers, we demonstrate the utility of our augmentation methods in helping generate high-quality, diverse, and relevant model safety prompts. We find that Amplio enabled red teamers to augment data quickly and creatively, highlighting the transformative potential of interactive augmentation workflows.