Multimodal 3D Genome Pre-training

📅 2025-04-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current 3D genomics research suffers from a semantic disconnect between chromatin structure (e.g., Hi-C contact maps) and epigenetic functional signals, lacking a unified multimodal representation. Method: We propose MIX-HIC, the first foundational multimodal model integrating Hi-C–derived 3D chromatin architecture with multi-omics epigenetic signals. It introduces cross-modal interaction and mapping modules, leverages the first large-scale pretraining dataset comprising over one million paired Hi-C–epigenomic samples, and jointly models contact map topology with temporal/spatial patterns of epigenetic signals. Contributions/Results: (1) It achieves end-to-end unified representation of 3D genome structure and functional semantics; (2) It significantly outperforms state-of-the-art methods on downstream tasks including chromatin state prediction and enhancer–promoter link inference; (3) We publicly release both the model and dataset, establishing a scalable foundational infrastructure for functional interpretation of 3D genome organization.

Technology Category

Application Category

📝 Abstract
Deep learning techniques have driven significant progress in various analytical tasks within 3D genomics in computational biology. However, a holistic understanding of 3D genomics knowledge remains underexplored. Here, we propose MIX-HIC, the first multimodal foundation model of 3D genome that integrates both 3D genome structure and epigenomic tracks, which obtains unified and comprehensive semantics. For accurate heterogeneous semantic fusion, we design the cross-modal interaction and mapping blocks for robust unified representation, yielding the accurate aggregation of 3D genome knowledge. Besides, we introduce the first large-scale dataset comprising over 1 million pairwise samples of Hi-C contact maps and epigenomic tracks for high-quality pre-training, enabling the exploration of functional implications in 3D genomics. Extensive experiments show that MIX-HIC can significantly surpass existing state-of-the-art methods in diverse downstream tasks. This work provides a valuable resource for advancing 3D genomics research.
Problem

Research questions and friction points this paper is trying to address.

Integrating 3D genome structure and epigenomic tracks for unified semantics
Achieving accurate heterogeneous semantic fusion in 3D genomics
Providing a large-scale dataset for high-quality pre-training in 3D genomics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal fusion of 3D genome and epigenomic tracks
Cross-modal interaction for unified representation
Large-scale dataset with 1 million pairwise samples
🔎 Similar Papers
No similar papers found.
M
Minghao Yang
Artificial Intelligence Thrust, Information Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511466, China
Pengteng Li
Pengteng Li
HKUST(GZ) / SZU
MLLMObject DetectionEvent Camera
Yan Liang
Yan Liang
Northwestern Polytechnical University
Information fusionState EstimationTarget tracking
Qianyi Cai
Qianyi Cai
Hong Kong University of Science and Technology (Guangzhou)
Data MiningContinual LearningAI4SLarge Language Models
Z
Zhihang Zheng
Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511466, China
S
Shichen Zhang
Bioscience and Biomedical Engineering Thrust, System Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511466, China
P
Pengfei Zhang
Artificial Intelligence Thrust, Information Hub, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, 511466, China
Zhi-An Huang
Zhi-An Huang
City University of Hong Kong (Dongguan)
Artificial IntelligenceBioinformaticsMedical Image Analysis
Hui Xiong
Hui Xiong
Senior Scientist, Candela Corporation
Ultrafast dynamicsatomic molecular physicsfree electron laser