RDMM: Fine-Tuned LLM Models for On-Device Robotic Decision Making with Enhanced Contextual Awareness in Specific Domains

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

235K/year
🤖 AI Summary
To address the insufficient real-time decision-making capability of domestic service robots on resource-constrained edge devices, this paper proposes RDMM—a lightweight, domain-specific, on-device decision-making model. Methodologically, we introduce the first on-device large language model (LLM) framework supporting capability-aware reasoning and context-aware adaptation; integrate ViT, Whisper, and CLIP for multimodal visual–auditory understanding; and design a real-time vision–language–action joint planning architecture. Through model fine-tuning, inference optimization, and on-device deployment under an 8 GB memory constraint, the entire pipeline operates fully locally. Key contributions include: (1) releasing the first household task-oriented dataset comprising 27K planning instances and 1.3K image–text annotations; (2) achieving 93% planning accuracy; and (3) open-sourcing all models, code, benchmarks, and datasets to enable real-time deployment on edge hardware.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) represent a significant advancement in integrating physical robots with AI-driven systems. We showcase the capabilities of our framework within the context of the real-world household competition. This research introduces a framework that utilizes RDMM (Robotics Decision-Making Models), which possess the capacity for decision-making within domain-specific contexts, as well as an awareness of their personal knowledge and capabilities. The framework leverages information to enhance the autonomous decision-making of the system. In contrast to other approaches, our focus is on real-time, on-device solutions, successfully operating on hardware with as little as 8GB of memory. Our framework incorporates visual perception models equipping robots with understanding of their environment. Additionally, the framework has integrated real-time speech recognition capabilities, thus enhancing the human-robot interaction experience. Experimental results demonstrate that the RDMM framework can plan with an 93% accuracy. Furthermore, we introduce a new dataset consisting of 27k planning instances, as well as 1.3k text-image annotated samples derived from the competition. The framework, benchmarks, datasets, and models developed in this work are publicly available on our GitHub repository at https://github.com/shadynasrat/RDMM.
Problem

Research questions and friction points this paper is trying to address.

Real-time Decision-making
Resource-constrained Devices
Robotics Intelligence
Innovation

Methods, ideas, or system contributions that make the work stand out.

RDMM System
On-device Decision-making
Fine-tuned Large Language Model
S
Shady Nasrat
Faculty of Electrical Engineering, Pusan National University, Busan, South Korea
M
Myungsu Kim
Faculty of Electrical Engineering, Pusan National University, Busan, South Korea
S
Seonil Lee
Faculty of Electrical Engineering, Pusan National University, Busan, South Korea
Jiho Lee
Jiho Lee
StradVision
Computer VisionDeep LearningDomain Generalization
Y
Yeoncheol Jang
Faculty of Electrical Engineering, Pusan National University, Busan, South Korea
S
Seung-joon Yi
Faculty of Electrical Engineering, Pusan National University, Busan, South Korea