Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal alignment, conventional methods treat all negative samples uniformly, neglecting “ambiguous negatives”—those differing from positives only in subtle, boundary-critical aspects—leading to ill-defined decision boundaries. To address this, we propose Boundary-aware Curriculum Learning (BCL), the first framework to leverage ambiguous boundary samples as curriculum signals. BCL achieves robust, annotation-free alignment via progressive boundary sampling and a local contrastive attention mechanism. It introduces a boundary-aware negative sampling strategy and a differentiable contrastive local attention loss, naturally compatible with dual-encoder architectures. We theoretically establish a generalization error bound of *O*(1/*n*). Empirically, BCL achieves up to 32% absolute improvement in R@1 over CLIP across four large-scale benchmarks, setting new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Most multimodal models treat every negative pair alike, ignoring the ambiguous negatives that differ from the positive by only a small detail. We propose Boundary-Aware Curriculum with Local Attention (BACL), a lightweight add-on that turns these borderline cases into a curriculum signal. A Boundary-aware Negative Sampler gradually raises difficulty, while a Contrastive Local Attention loss highlights where the mismatch occurs. The two modules are fully differentiable and work with any off-the-shelf dual encoder. Theory predicts a fast O(1/n) error rate; practice shows up to +32% R@1 over CLIP and new SOTA on four large-scale benchmarks, all without extra labels.
Problem

Research questions and friction points this paper is trying to address.

Distinguishing ambiguous negative pairs from positive multimodal pairs
Improving multimodal alignment through boundary-aware curriculum learning
Enhancing contrastive learning without requiring additional labeled data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Boundary-aware curriculum learning for multimodal alignment
Differentiable negative sampler with difficulty progression
Contrastive local attention loss for mismatch localization
🔎 Similar Papers
No similar papers found.
H
Hua Ye
Nanjing University
H
Hang Ding
Shanghai Jiao Tong University
S
Siyuan Chen
University of Bristol
Yiyang Jiang
Yiyang Jiang
PhD student, Hong Kong Polytechnic University
Machine LearningComputer VisionVision-Language UnderstandingNatural Language Processing
C
Changyuan Zhang
The University of Hong Kong
X
Xuan Zhang
Carnegie Mellon University