An Exploration of Self-Supervised Mutual Information Alignment for Multi-Task Settings

📅 2024-10-02
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
To address the challenge of aligning language models with diverse individual preferences—such as humanities, STEM, programming, and mathematics—this paper proposes SAMI, a self-supervised alignment method based on Conditional Mutual Information Maximization. SAMI is the first to introduce mutual information optimization into multi-task preference alignment, eliminating reliance on explicit reward modeling. The method integrates multi-task preference distillation with zero-shot or multi-attempt evaluation (using MT-Bench and GSM-8K), uncovering distinctive scaling laws under multi-attempt reasoning. Experiments demonstrate that SAMI achieves a 57% win rate on MT-Bench with single-round alignment; on GSM-8K, it improves multi-attempt accuracy by 3.9%, and further gains +1.3% when combined with supervised fine-tuning (SFT). These results validate SAMI’s effectiveness and its strong synergistic potential with conventional alignment paradigms.

Technology Category

Application Category

📝 Abstract
There is a growing need for pluralistic alignment methods that can steer language models towards individual attributes and preferences. One such method, Self-Supervised Alignment with Mutual Information (SAMI), uses conditional mutual information to encourage the connection between behavioral preferences and model responses. We conduct two experiments exploring SAMI in multi-task settings. First, we compare SAMI to Direct Preference Optimization (DPO) on a multi-task benchmark (MT-Bench), using a stronger model to generate training data for a weaker one across diverse categories (humanities, STEM, extraction, coding, math, reasoning, and roleplay). Our results indicate that one iteration of SAMI has a 57% win rate against DPO, with significant variation in performance between task categories. Second, we examine SAMI's impact on mathematical accuracy (GSM-8K) relative to supervised fine-tuning (SFT). While SAMI increases zero-shot performance by 1.1%, SFT is more effective with a 3.2% boost. However, SAMI shows interesting scaling trends. When given 10 attempts, SAMI improves accuracy by 3.9%, while SFT achieves a 10.1% increase. Combining SAMI with SFT yields an additional improvement of 1.3% in multi-attempt settings, though single-attempt accuracy remains unchanged.
Problem

Research questions and friction points this paper is trying to address.

Develop pluralistic alignment methods for diverse model attributes
Compare SAMI and DPO in multi-task performance benchmarks
Assess SAMI's impact on mathematical accuracy versus SFT
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Supervised Mutual Information Alignment (SAMI)
Multi-task benchmark comparison with DPO
Combining SAMI with supervised fine-tuning (SFT)
🔎 Similar Papers