๐ค AI Summary
This paper addresses implicit bias in vision models arising from dataset biases (e.g., CelebA), proposing Attention-IoUโa novel internal representation bias metric grounded in attention maps. Unlike conventional external evaluations relying on subgroup accuracy, Attention-IoU quantifies spatial overlap (Intersection-over-Union) between attention regions and features correlated with sensitive attributes (e.g., gender), thereby exposing unannotated confounders and non-explicit bias pathways (e.g., โgenderโglassesโ coupling). Key contributions include: (i) the first systematic use of attention mechanisms to disentangle multidimensional attribute coupling biases; (ii) identification of latent bias sources beyond annotated labels in CelebA; and (iii) empirical validation on synthetic Waterbirds data and controlled resampling experiments, demonstrating statistically significant improvements over existing bias detection methods.
๐ Abstract
Computer vision models have been shown to exhibit and amplify biases across a wide array of datasets and tasks. Existing methods for quantifying bias in classification models primarily focus on dataset distribution and model performance on subgroups, overlooking the internal workings of a model. We introduce the Attention-IoU (Attention Intersection over Union) metric and related scores, which use attention maps to reveal biases within a model's internal representations and identify image features potentially causing the biases. First, we validate Attention-IoU on the synthetic Waterbirds dataset, showing that the metric accurately measures model bias. We then analyze the CelebA dataset, finding that Attention-IoU uncovers correlations beyond accuracy disparities. Through an investigation of individual attributes through the protected attribute of Male, we examine the distinct ways biases are represented in CelebA. Lastly, by subsampling the training set to change attribute correlations, we demonstrate that Attention-IoU reveals potential confounding variables not present in dataset labels.