Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge that multimodal large language models (MLLMs) suffer from mean regression bias in long-tailed numerical regression tasks, leading to significantly degraded performance on tail samples. The authors propose a distribution-aware reinforcement learning framework that, for the first time, incorporates cross-sample relational supervision into MLLM regression training. By introducing Group Relative Policy Optimization (GRPO) and a reward mechanism based on the concordance correlation coefficient, the method achieves plug-and-play distribution alignment without modifying the model architecture. This approach effectively enhances the consistency between predictions and the true data distribution in terms of correlation, scale, and mean. It substantially outperforms existing methods across multiple long-tailed regression benchmarks, with particularly notable gains in medium- and few-shot settings.

📝 Abstract

Multimodal large language models (MLLMs) struggle with numerical regression under long-tailed target distributions. Token-level supervised fine-tuning (SFT) and point-wise regression rewards bias learning toward high-density regions, leading to regression-to-the-mean behavior and poor tail performance. We identify the lack of cross-sample relational supervision as a key limitation of existing MLLM training paradigms. To address it, we propose a distribution-aware reinforcement learning framework based on Group Relative Policy Optimization, which introduces batch-level comparison-based supervision via the Concordance Correlation Coefficient-based reward to align predicted and ground-truth distributions in terms of correlation, scale, and mean. The framework is plug-and-play, requiring no architectural modification. Experiments on a unified suite of long-tailed regression benchmarks show consistent improvements over SFT and existing MLLM regression methods, with particularly strong gains in medium- and few-shot regimes.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Large Language Models

Long-tailed Regression

Distributional Awareness

Imbalanced Regression

Cross-sample Relational Supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distribution-aware Reinforcement Learning

Group Relative Policy Optimization

Concordance Correlation Coefficient