LFM2 Technical Report

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high latency, excessive memory consumption, and weak multi-task capability of Liquid Foundation Models (LFMs) on edge devices, this paper introduces LFM2—a family of efficient on-device models. Methodologically: (i) we propose the first hardware-aware loop architecture search to jointly design a lightweight hybrid backbone integrating gated convolutions and grouped-query attention; (ii) we devise a three-stage post-training pipeline combining curriculum learning, decoupled Top-K knowledge distillation, and length-normalized preference optimization; (iii) the models support deployment via ExecuTorch and llama.cpp. Experiments show that LFM2 variants (350M–8.3B parameters) significantly outperform comparable lightweight models on IFEval (79.56%) and GSM8K (82.41%), achieve 2× faster CPU prefilling and decoding, and enable multilingual retrieval and real-time speech interaction. Code, weights, and deployment packages are publicly released.

Technology Category

Application Category

📝 Abstract
We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge latency and memory constraints, we obtain a compact hybrid backbone that combines gated short convolutions with a small number of grouped query attention blocks, delivering up to 2x faster prefill and decode on CPUs compared to similarly sized models. The LFM2 family covers 350M-8.3B parameters, including dense models (350M, 700M, 1.2B, 2.6B) and a mixture-of-experts variant (8.3B total, 1.5B active), all with 32K context length. LFM2's training pipeline includes a tempered, decoupled Top-K knowledge distillation objective that avoids support mismatch; curriculum learning with difficulty-ordered data; and a three-stage post-training recipe of supervised fine-tuning, length-normalized preference optimization, and model merging. Pre-trained on 10-12T tokens, LFM2 models achieve strong results across diverse benchmarks; for example, LFM2-2.6B reaches 79.56% on IFEval and 82.41% on GSM8K. We further build multimodal and retrieval variants: LFM2-VL for vision-language tasks, LFM2-Audio for speech, and LFM2-ColBERT for retrieval. LFM2-VL supports tunable accuracy-latency tradeoffs via token-efficient visual processing, while LFM2-Audio separates audio input and output pathways to enable real-time speech-to-speech interaction competitive with models 3x larger. LFM2-ColBERT provides a low-latency encoder for queries and documents, enabling high-performance retrieval across multiple languages. All models are released with open weights and deployment packages for ExecuTorch, llama.cpp, and vLLM, making LFM2 a practical base for edge applications that need fast, memory-efficient inference and strong task capabilities.
Problem

Research questions and friction points this paper is trying to address.

Developing efficient on-device AI models for edge deployment constraints
Creating compact models with strong task performance across diverse benchmarks
Building multimodal variants for vision, audio and retrieval applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-in-the-loop architecture search for edge deployment
Hybrid backbone combining convolutions with attention blocks
Tempered decoupled Top-K knowledge distillation training pipeline
🔎 Similar Papers
No similar papers found.