Decentralized Autoregressive Generation

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work investigates decentralized training of autoregressive generative models while preserving performance. To this end, it introduces the Decentralized Discrete Flow Matching objective, which models the generative process as a linear combination of expert flows, and formally establishes the first decentralized autoregressive generation framework. Experiments on multimodal language models—such as LLaVA and InternVL 2.5-1B—combined with a CLIP vision encoder and full-parameter fine-tuning (encompassing ViT, MLP, and LLM components)—demonstrate that the proposed approach achieves performance comparable to centralized training across multiple benchmarks. These results validate the equivalence of decentralized and centralized training in multimodal settings and provide both theoretical grounding and practical insights for efficient distributed autoregressive generation.

Technology Category

Application Category

📝 Abstract

We present a theoretical analysis of decentralization of autoregressive generation. We define the Decentralized Discrete Flow Matching objective, by expressing probability generating velocity as a linear combination of expert flows. We also conduct experiments demonstrating the equivalence between decentralized and centralized training settings for multimodal language models across diverse set of benchmarks. Specifically, we compare two distinct paradigms: LLaVA and InternVL 2.5-1B, which uses a fixed CLIP vision encoder and performs full-parameter fine-tuning (ViT+MLP+LLM) during the instruction tuning stage.

Problem

Research questions and friction points this paper is trying to address.

Decentralized Autoregressive Generation

Decentralized Discrete Flow Matching

Multimodal Language Models

Centralized vs Decentralized Training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized Autoregressive Generation

Discrete Flow Matching

Multimodal Language Models

Expert Flows