🤖 AI Summary
This work proposes A.X K1, a 519-billion-parameter mixture-of-experts (MoE) language model trained from scratch under constrained computational budgets, designed to simultaneously enhance multilingual—particularly Korean—reasoning capabilities and inference efficiency. Leveraging a 10-trillion-token corpus, multi-stage data curation, scaling-law-informed training configurations, and an innovative Think-Fusion training strategy, the model enables users to explicitly control reasoning mode switching. Experimental results demonstrate that A.X K1 achieves state-of-the-art performance among open-source models across multiple benchmarks, significantly outperforming existing approaches—especially on Korean-language tasks—while maintaining high inference efficiency and deployment flexibility.
📝 Abstract
We introduce A.X K1, a 519B-parameter Mixture-of-Experts (MoE) language model trained from scratch. Our design leverages scaling laws to optimize training configurations and vocabulary size under fixed computational budgets. A.X K1 is pre-trained on a corpus of approximately 10T tokens, curated by a multi-stage data processing pipeline. Designed to bridge the gap between reasoning capability and inference efficiency, A.X K1 supports explicitly controllable reasoning to facilitate scalable deployment across diverse real-world scenarios. We propose a simple yet effective Think-Fusion training recipe, enabling user-controlled switching between thinking and non-thinking modes within a single unified model. Extensive evaluations demonstrate that A.X K1 achieves performance competitive with leading open-source models, while establishing a distinctive advantage in Korean-language benchmarks.