Exploring Empty Spaces: Human-in-the-Loop Data Augmentation

📅 2024-10-01

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

To address insufficient coverage of edge cases and limited textual diversity in AI model safety evaluation, this paper proposes Amplio, a human-in-the-loop data augmentation framework that systematically identifies unexplored semantic “blank spaces” to uncover latent risk scenarios. Methodologically, Amplio introduces three novel collaborative augmentation techniques: concept-guided augmentation, semantic interpolation-based exploration, and LLM-driven augmentation—integrating red-teaming experts’ domain intuition into the generative feedback loop via interactive visualization, semantic space modeling, controllable text interpolation, and integrated prompt engineering with iterative feedback. In a user study involving 18 professional red-team practitioners, Amplio significantly improved the quality, diversity, and relevance of safety prompts, achieving a 3.2× average increase in augmentation efficiency. The framework establishes a new paradigm for robust, interpretable, and expert-augmented AI safety assessment.

Technology Category

Application Category

📝 Abstract

Data augmentation is crucial to make machine learning models more robust and safe. However, augmenting data can be challenging as it requires generating diverse data points to rigorously evaluate model behavior on edge cases and mitigate potential harms. Creating high-quality augmentations that cover these"unknown unknowns"is a time- and creativity-intensive task. In this work, we introduce Amplio, an interactive tool to help practitioners navigate"unknown unknowns"in unstructured text datasets and improve data diversity by systematically identifying empty data spaces to explore. Amplio includes three human-in-the-loop data augmentation techniques: Augment With Concepts, Augment by Interpolation, and Augment with Large Language Model. In a user study with 18 professional red teamers, we demonstrate the utility of our augmentation methods in helping generate high-quality, diverse, and relevant model safety prompts. We find that Amplio enabled red teamers to augment data quickly and creatively, highlighting the transformative potential of interactive augmentation workflows.

Problem

Research questions and friction points this paper is trying to address.

Enhances data augmentation for machine learning

Identifies unknown unknowns in text datasets

Improves diversity and quality of model safety prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-in-the-loop data augmentation

Interactive tool Amplio

Systematic empty space identification

🔎 Similar Papers

Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods