ACE and Diverse Generalization via Selective Disagreement

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Deep neural networks suffer from spurious correlations, leading to fundamental generalization uncertainty—especially under fully spurious correlation settings. To address this, we propose a robust concept learning framework based on selective disagreement: it introduces high-confidence constraints and a selective disagreement mechanism to leverage prediction inconsistencies over unlabeled data during self-training, enabling unsupervised model selection and integration of prior knowledge. Crucially, our method mitigates underspecification without requiring trusted labels or unreliable external metrics. Evaluated on both complete and incomplete spurious correlation benchmarks, it achieves state-of-the-art performance. Moreover, it demonstrates strong generalization and configuration flexibility in language model alignment tasks, confirming its broad applicability beyond standard vision-centric settings.

Technology Category

Application Category

📝 Abstract

Deep neural networks are notoriously sensitive to spurious correlations - where a model learns a shortcut that fails out-of-distribution. Existing work on spurious correlations has often focused on incomplete correlations,leveraging access to labeled instances that break the correlation. But in cases where the spurious correlations are complete, the correct generalization is fundamentally extit{underspecified}. To resolve this underspecification, we propose learning a set of concepts that are consistent with training data but make distinct predictions on a subset of novel unlabeled inputs. Using a self-training approach that encourages extit{confident} and extit{selective} disagreement, our method ACE matches or outperforms existing methods on a suite of complete-spurious correlation benchmarks, while remaining robust to incomplete spurious correlations. ACE is also more configurable than prior approaches, allowing for straight-forward encoding of prior knowledge and principled unsupervised model selection. In an early application to language-model alignment, we find that ACE achieves competitive performance on the measurement tampering detection benchmark extit{without} access to untrusted measurements. While still subject to important limitations, ACE represents significant progress towards overcoming underspecification.

Problem

Research questions and friction points this paper is trying to address.

Resolving underspecification in complete spurious correlations

Learning concepts making distinct predictions on novel inputs

Overcoming model sensitivity to distributional shortcuts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-training with confident selective disagreement

Learning consistent concepts with distinct predictions

Configurable prior knowledge encoding and model selection

🔎 Similar Papers

A Multi-Task and Multi-Label Classification Model for Implicit Discourse Relation Recognition