Optimizing Small Language Models for In-Vehicle Function-Calling

📅 2025-01-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of resource constraints on vehicular edge devices and the inflexibility of traditional rule-based systems, this paper proposes an efficient deployment framework for small language models (SLMs) tailored to vehicle control scenarios. Our method systematically integrates structured pruning, post-pruning “healing” training, and INT4 quantization, augmented by task-specific fine-tuning and lightweight inference runtime optimization. Using Phi-3 mini as the base model, we remove 2 billion parameters while preserving accuracy on complex vehicle control tasks—achieving functional call accuracy nearly matching that of the full model. Under CPU-only execution (no hardware acceleration), the optimized model attains real-time inference at 11 tokens/s, satisfying automotive-grade constraints on memory footprint, computational capacity, and latency. The resulting solution enables offline, highly available, natural-language-driven human–vehicle interaction directly on-device, significantly enhancing both system robustness and user experience in automotive infotainment and control systems.

Technology Category

Application Category

📝 Abstract
We propose a holistic approach for deploying Small Language Models (SLMs) as function-calling agents within vehicles as edge devices, offering a more flexible and robust alternative to traditional rule-based systems. By leveraging SLMs, we simplify vehicle control mechanisms and enhance the user experience. Given the in-vehicle hardware constraints, we apply state-of-the-art model compression techniques, including structured pruning, healing, and quantization, ensuring that the model fits within the resource limitations while maintaining acceptable performance. Our work focuses on optimizing a representative SLM, Microsoft's Phi-3 mini, and outlines best practices for enabling embedded models, including compression, task-specific fine-tuning, and vehicle integration. We demonstrate that, despite significant reduction in model size which removes up to 2 billion parameters from the original model, our approach preserves the model's ability to handle complex in-vehicle tasks accurately and efficiently. Furthermore, by executing the model in a lightweight runtime environment, we achieve a generation speed of 11 tokens per second, making real-time, on-device inference feasible without hardware acceleration. Our results demonstrate the potential of SLMs to transform vehicle control systems, enabling more intuitive interactions between users and their vehicles for an enhanced driving experience.
Problem

Research questions and friction points this paper is trying to address.

In-vehicle Language Models
Natural Interaction
Driver Experience Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model Compression
Vehicle Function Control
Natural Interaction
🔎 Similar Papers
No similar papers found.
Y
Yahya S. Khiabani
Mercedes-Benz Research & Development North America
F
Farris Atif
Mercedes-Benz Research & Development North America
C
Chieh Hsu
Mercedes-Benz Research & Development North America
S
Sven Stahlmann
Mercedes-Benz Tech Innovation
Benedikt Heidrich
Benedikt Heidrich
Data Scientist, MBTI; former KIT
Machine learningDeep Learning
M
M. Sarfraz
Mercedes-Benz Tech Innovation
J
Julian Merten
Mercedes-Benz Research & Development North America
Faezeh Tafazzoli
Faezeh Tafazzoli
Mercedes-Benz Research & Development North America