Aligning by Misaligning: Boundary-aware Curriculum Learning for Multimodal Alignment

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

In multimodal alignment, conventional methods treat all negative samples uniformly, neglecting “ambiguous negatives”—those differing from positives only in subtle, boundary-critical aspects—leading to ill-defined decision boundaries. To address this, we propose Boundary-aware Curriculum Learning (BCL), the first framework to leverage ambiguous boundary samples as curriculum signals. BCL achieves robust, annotation-free alignment via progressive boundary sampling and a local contrastive attention mechanism. It introduces a boundary-aware negative sampling strategy and a differentiable contrastive local attention loss, naturally compatible with dual-encoder architectures. We theoretically establish a generalization error bound of *O*(1/*n*). Empirically, BCL achieves up to 32% absolute improvement in R@1 over CLIP across four large-scale benchmarks, setting new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Most multimodal models treat every negative pair alike, ignoring the ambiguous negatives that differ from the positive by only a small detail. We propose Boundary-Aware Curriculum with Local Attention (BACL), a lightweight add-on that turns these borderline cases into a curriculum signal. A Boundary-aware Negative Sampler gradually raises difficulty, while a Contrastive Local Attention loss highlights where the mismatch occurs. The two modules are fully differentiable and work with any off-the-shelf dual encoder. Theory predicts a fast O(1/n) error rate; practice shows up to +32% R@1 over CLIP and new SOTA on four large-scale benchmarks, all without extra labels.

Problem

Research questions and friction points this paper is trying to address.

Distinguishing ambiguous negative pairs from positive multimodal pairs

Improving multimodal alignment through boundary-aware curriculum learning

Enhancing contrastive learning without requiring additional labeled data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Boundary-aware curriculum learning for multimodal alignment

Differentiable negative sampler with difficulty progression

Contrastive local attention loss for mismatch localization

🔎 Similar Papers

No similar papers found.