FlexOlmo: Open Language Models for Flexible Data Use

๐Ÿ“… 2025-07-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of collaborative inference under strict data isolationโ€”where distributed data cannot be shared and fine-grained, runtime-access control is required. To this end, we propose FlexOlmo, a Mixture-of-Experts (MoE) framework featuring a domain-aware routing mechanism that integrates independently trained expert models without joint training or raw-data sharing. FlexOlmo supports dynamic expert activation/deactivation during inference and enforces fine-grained, policy-driven data access control. Leveraging the FlexMix dataset and a decentralized training protocol, it enables model collaboration while preserving strict data sovereignty. Evaluated across 31 benchmark tasks, FlexOlmo achieves an average 41% performance gain over baseline models and outperforms existing model merging approaches by 10.1%. Moreover, it significantly surpasses conventional MoE models under equivalent computational overhead, demonstrating superior efficiency and adaptability in privacy-sensitive distributed settings.

Technology Category

Application Category

๐Ÿ“ Abstract
We introduce FlexOlmo, a new class of language models (LMs) that supports (1) distributed training without data sharing, where different model parameters are independently trained on closed datasets, and (2) data-flexible inference, where these parameters along with their associated data can be flexibly included or excluded from model inferences with no further training. FlexOlmo employs a mixture-of-experts (MoE) architecture where each expert is trained independently on closed datasets and later integrated through a new domain-informed routing without any joint training. FlexOlmo is trained on FlexMix, a corpus we curate comprising publicly available datasets alongside seven domain-specific sets, representing realistic approximations of closed sets. We evaluate models with up to 37 billion parameters (20 billion active) on 31 diverse downstream tasks. We show that a general expert trained on public data can be effectively combined with independently trained experts from other data owners, leading to an average 41% relative improvement while allowing users to opt out of certain data based on data licensing or permission requirements. Our approach also outperforms prior model merging methods by 10.1% on average and surpasses the standard MoE trained without data restrictions using the same training FLOPs. Altogether, this research presents a solution for both data owners and researchers in regulated industries with sensitive or protected data. FlexOlmo enables benefiting from closed data while respecting data owners' preferences by keeping their data local and supporting fine-grained control of data access during inference.
Problem

Research questions and friction points this paper is trying to address.

Enables distributed training without sharing closed datasets
Supports flexible data inclusion/exclusion during inference
Improves performance while respecting data licensing constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed training without data sharing
Data-flexible inference with no retraining
Domain-informed routing in MoE architecture
๐Ÿ”Ž Similar Papers
No similar papers found.
Weijia Shi
Weijia Shi
University of Washington
Natural Language ProcessingMachine Learning
A
Akshita Bhagia
Allen Institute for AI
K
Kevin Farhat
Allen Institute for AI
Niklas Muennighoff
Niklas Muennighoff
Stanford University
large language modelsartificial intelligencemachine learning
P
Pete Walsh
Allen Institute for AI
Jacob Morrison
Jacob Morrison
Allen Institute for AI
natural language processing
D
Dustin Schwenk
Allen Institute for AI
Shayne Longpre
Shayne Longpre
MIT, Stanford, Apple
Deep LearningNatural Language Understanding
J
Jake Poznanski
Allen Institute for AI
Allyson Ettinger
Allyson Ettinger
University of Chicago
Daogao Liu
Daogao Liu
University of Washington
Differential PrivacyAlgorithms
M
Margaret Li
University of Washington
Dirk Groeneveld
Dirk Groeneveld
Allen Institute for Artificial Intelligence
natural language processingneural networksdeep learning
Mike Lewis
Mike Lewis
Facebook AI Research
Natural language processingmachine learninglinguistics
W
Wen-tau Yih
University of Washington
Luca Soldaini
Luca Soldaini
Allen Institute for AI
Large Language ModelsOpen Source AIInformation Retrieval
Kyle Lo
Kyle Lo
Allen Institute for AI
natural language processingmachine learninghuman computer interactionstatistics
Noah A. Smith
Noah A. Smith
University of Washington; Allen Institute for Artificial Intelligence
natural language processingmachine learningcomputational social sciencecomputer music
Luke Zettlemoyer
Luke Zettlemoyer
University of Washington; Meta
Natural Language ProcessingSemanticsMachine LearningArtificial Intelligence
Pang Wei Koh
Pang Wei Koh
University of Washington; Allen Institute for AI
Machine learningNatural language processingComputational biology
Hannaneh Hajishirzi
Hannaneh Hajishirzi
University of Washington; Allen AI
NLPLangauge modelsAI
A
Ali Farhadi
University of California, Berkeley
Sewon Min
Sewon Min
UC Berkeley EECS & Allen Institute for AI
Natural Language ProcessingMachine Learning