GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the challenge of developing a parameter-efficient, high-performance open-source Mixture-of-Experts (MoE) language model—GLM-4.5—that excels in agent behavior, complex reasoning, and code generation. We propose a dual-mode hybrid inference mechanism (Chain-of-Thought reasoning vs. direct response) and a multi-stage MoE expert iterative training framework, enabling high performance with sparse parameter activation. The model is trained on 23T tokens via multi-phase pretraining, reinforcement learning, and fine-grained post-training. Evaluated on TAU-Bench (70.1%), AIME 2024 (91.0%), and SWE-bench Verified (64.2%), GLM-4.5 ranks third overall and second on agent-centric tasks. To foster research in lightweight large language models, we publicly release both full-scale and compact variants of GLM-4.5.

Technology Category

Application Category

📝 Abstract

We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at https://github.com/zai-org/GLM-4.5.

Problem

Research questions and friction points this paper is trying to address.

Develops hybrid reasoning for thinking and response modes

Enhances performance in agentic, reasoning, and coding tasks

Provides open-source models for AI reasoning research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts large language model

Hybrid reasoning with dual response modes

Multi-stage training and expert iteration

🔎 Similar Papers

Towards Graph Foundation Models: A Survey and Beyond