Partitioner Guided Modal Learning Framework

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

In multimodal learning, disentangling and jointly leveraging modality-specific features and cross-modal interaction features remains challenging. To address this, we propose Partition-guided Modality Learning (PgM), a novel framework that explicitly decomposes representations into two orthogonal subspaces—modality-exclusive features (dedicated to individual modalities) and paired features (shared across modalities)—via a learnable modality partitioner. These subspaces are then processed by dedicated unimodal and paired learners, respectively, supporting heterogeneous learning rates and dynamic distribution adaptation. A task-adaptive decoder reconstructs features and tailors them to downstream objectives. Our key contribution is the first introduction of a differentiable, learnable modality partitioning mechanism, enabling fine-grained feature disentanglement and substantially improving model interpretability and generalization across tasks. Evaluated on four mainstream multimodal benchmarks, PgM consistently outperforms strong baselines, demonstrating superior transferability and visualization-based interpretability.

Technology Category

Application Category

📝 Abstract

Multimodal learning benefits from multiple modal information, and each learned modal representations can be divided into uni-modal that can be learned from uni-modal training and paired-modal features that can be learned from cross-modal interaction. Building on this perspective, we propose a partitioner-guided modal learning framework, PgM, which consists of the modal partitioner, uni-modal learner, paired-modal learner, and uni-paired modal decoder. Modal partitioner segments the learned modal representation into uni-modal and paired-modal features. Modal learner incorporates two dedicated components for uni-modal and paired-modal learning. Uni-paired modal decoder reconstructs modal representation based on uni-modal and paired-modal features. PgM offers three key benefits: 1) thorough learning of uni-modal and paired-modal features, 2) flexible distribution adjustment for uni-modal and paired-modal representations to suit diverse downstream tasks, and 3) different learning rates across modalities and partitions. Extensive experiments demonstrate the effectiveness of PgM across four multimodal tasks and further highlight its transferability to existing models. Additionally, we visualize the distribution of uni-modal and paired-modal features across modalities and tasks, offering insights into their respective contributions.

Problem

Research questions and friction points this paper is trying to address.

Separates multimodal features into uni-modal and paired-modal components

Enables flexible representation adjustment for diverse downstream tasks

Improves multimodal learning with partition-specific learning rates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modal partitioner segments uni and paired features

Dedicated learners for uni and paired modalities

Decoder reconstructs from uni and paired features

🔎 Similar Papers

No similar papers found.

Nvidia

base salary range is 192,000 USD - 304,750 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.

US, CA, Santa Clara

Natural Language Processing Researcher

Kitware

Arlington, Virginia

Authors to Follow