Decentralized Autoregressive Generation

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates decentralized training of autoregressive generative models while preserving performance. To this end, it introduces the Decentralized Discrete Flow Matching objective, which models the generative process as a linear combination of expert flows, and formally establishes the first decentralized autoregressive generation framework. Experiments on multimodal language models—such as LLaVA and InternVL 2.5-1B—combined with a CLIP vision encoder and full-parameter fine-tuning (encompassing ViT, MLP, and LLM components)—demonstrate that the proposed approach achieves performance comparable to centralized training across multiple benchmarks. These results validate the equivalence of decentralized and centralized training in multimodal settings and provide both theoretical grounding and practical insights for efficient distributed autoregressive generation.

Technology Category

Application Category

📝 Abstract
We present a theoretical analysis of decentralization of autoregressive generation. We define the Decentralized Discrete Flow Matching objective, by expressing probability generating velocity as a linear combination of expert flows. We also conduct experiments demonstrating the equivalence between decentralized and centralized training settings for multimodal language models across diverse set of benchmarks. Specifically, we compare two distinct paradigms: LLaVA and InternVL 2.5-1B, which uses a fixed CLIP vision encoder and performs full-parameter fine-tuning (ViT+MLP+LLM) during the instruction tuning stage.
Problem

Research questions and friction points this paper is trying to address.

Decentralized Autoregressive Generation
Decentralized Discrete Flow Matching
Multimodal Language Models
Centralized vs Decentralized Training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized Autoregressive Generation
Discrete Flow Matching
Multimodal Language Models
Expert Flows
Full-parameter Fine-tuning
🔎 Similar Papers
No similar papers found.