MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of current image generation models when handling multiple visual references, primarily attributed to the absence of structured long-context training data. To this end, the authors introduce MacroData, a dataset comprising 400,000 samples, each containing up to ten reference images, systematically organized along four dimensions: customization, illustration, spatial reasoning, and temporal dynamics. Complementing this, they propose MacroBench, a benchmark with 4,000 curated samples for evaluation. Fine-tuning on MacroData substantially improves multi-reference image generation quality, offering the first empirical validation that large-scale structured data and cross-task collaborative training are crucial for model performance. This study thus provides foundational resources—data, benchmark, and methodology—for advancing research in multi-reference image generation.

Technology Category

Application Category

📝 Abstract
Generating images conditioned on multiple visual references is critical for real-world applications such as multi-subject composition, narrative illustration, and novel view synthesis, yet current models suffer from severe performance degradation as the number of input references grows. We identify the root cause as a fundamental data bottleneck: existing datasets are dominated by single- or few-reference pairs and lack the structured, long-context supervision needed to learn dense inter-reference dependencies. To address this, we introduce MacroData, a large-scale dataset of 400K samples, each containing up to 10 reference images, systematically organized across four complementary dimensions -- Customization, Illustration, Spatial reasoning, and Temporal dynamics -- to provide comprehensive coverage of the multi-reference generation space. Recognizing the concurrent absence of standardized evaluation protocols, we further propose MacroBench, a benchmark of 4,000 samples that assesses generative coherence across graded task dimensions and input scales. Extensive experiments show that fine-tuning on MacroData yields substantial improvements in multi-reference generation, and ablation studies further reveal synergistic benefits of cross-task co-training and effective strategies for handling long-context complexity. The dataset and benchmark will be publicly released.
Problem

Research questions and friction points this paper is trying to address.

multi-reference image generation
long-context data
data bottleneck
structured supervision
generative coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-reference image generation
long-context data
structured dataset
benchmark evaluation
cross-task co-training
🔎 Similar Papers
No similar papers found.