🤖 AI Summary
Large language models (LLMs) exhibit systematic failures on specific data subsets—termed “error slices”—yet manual identification of such slices is costly and inefficient. To address this, we propose an *active slicing discovery* framework that integrates uncertainty sampling with multi-granularity feature representations, enabling efficient error slice detection under low annotation budgets. Our method achieves precise localization of model weaknesses at either semantic or demographic levels using only 2%–10% labeled data. Evaluated on multiple real-world slices in toxicity classification, it significantly outperforms baseline approaches in recall and F1-score, demonstrating both effectiveness and generalizability. Our core contribution is the first systematic application of active learning to error slice discovery—striking a balance between interpretability and annotation efficiency—and establishing a new paradigm for LLM robustness analysis and targeted model improvement.
📝 Abstract
Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic comments regarding that demographic. Identifying error slices is crucial to understanding and improving models, but it is also challenging. An appealing approach to reduce the amount of manual annotation required is to actively group errors that are likely to belong to the same slice, while using limited access to an annotator to verify whether the chosen samples share the same pattern of model mistake. In this paper, we formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification. We examine the efficacy of active slice discovery under different choices of feature representations and active learning algorithms. On several slices, we find that uncertainty-based active learning algorithms are most effective, achieving competitive accuracy using 2-10% of the available slice membership information, while significantly outperforming baselines.