The Breeze 2 Herd of Models: Traditional Chinese LLMs Based on Llama with Vision-Aware and Function-Calling Capabilities

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the limitations in Traditional Chinese multimodal understanding and the challenges of lightweight deployment, this paper introduces the Breeze-2 series (3B/8B), built upon the Llama 3 architecture and featuring the first-ever Traditional Chinese–optimized multimodal design: (i) integration of a ViT-based visual encoder with a cross-modal bridging module for end-to-end joint image-text modeling; and (ii) incorporation of template-guided function calling and multi-task alignment fine-tuning to enhance instruction following and tool-use robustness. Evaluated on benchmarks including Taiwan-specific commonsense reasoning, long-context comprehension, visual question answering, and function calling, Breeze-2 achieves state-of-the-art performance across all tasks. The 3B variant supports efficient on-device deployment. All models are open-sourced under the Llama 3 Community License, establishing the first publicly available, production-ready foundation for Traditional Chinese multimodal research and applications.

Technology Category

Application Category

📝 Abstract

Breeze 2 is a suite of advanced multi-modal language models, available in 3B and 8B parameter configurations, specifically designed to enhance Traditional Chinese language representation. Building upon the Llama 3, Breeze 2 continues pretraining on an extensive corpus to enhance the linguistic and cultural heritage of Traditional Chinese. It incorporates vision-aware capabilities through a visual encoder and a bridge module, and supports function-calling via prompt templates and post-training on function-calling data. The effectiveness of Breeze 2 is benchmarked across various tasks, including Taiwan general knowledge, instruction-following, long context, function calling, and vision understanding. Furthermore, we showcase the capabilities of the its 3B model in a mobile application. We are publicly releasing all Breeze 2 models under the Llama 3 Community License.

Problem

Research questions and friction points this paper is trying to address.

Advanced Language Model

Traditional Chinese Understanding

Multimodal Task Performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Processing

Advanced Language Model

Chinese Language Understanding

🔎 Similar Papers

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions