SensorBench: Benchmarking LLMs in Coding-Based Sensor Processing

📅 2024-10-14

🏛️ Workshop on Mobile Computing Systems and Applications

📈 Citations: 1

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited capability in jointly optimizing multiple parameters for composite tasks in code-based sensor data processing (e.g., IMU, ECG, audio), hindering their deployment in embedded perception systems. Method: We introduce SensorBench—the first dedicated benchmark for evaluating LLMs on sensor data processing—built upon diverse real-world sensing datasets and comprising two task categories: code generation and code interpretation. It systematically assesses LLM performance under four prompting paradigms: zero-shot, chain-of-thought, self-consistency, and our novel self-verification prompting. Contribution/Results: Experiments reveal robust LLM performance on simple tasks but significant degradation on compositionally complex, multi-parameter optimization tasks—underperforming domain experts. Self-verification prompting achieves state-of-the-art results on 48% of tasks. The benchmark, along with reproducible evaluation protocols and practical prompting guidelines tailored for sensor development, establishes a foundational methodology for integrating LLMs into embedded sensing applications.

Technology Category

Application Category

📝 Abstract

Effective processing, interpretation, and management of sensor data have emerged as a critical component of cyber-physical systems. Traditionally, processing sensor data requires profound theoretical knowledge and proficiency in signal-processing tools. However, recent works show that Large Language Models (LLMs) have promising capabilities in processing sensory data, suggesting their potential as copilots for developing sensing systems. To explore this potential, we construct a comprehensive benchmark, SensorBench, to establish a quantifiable objective. The benchmark incorporates diverse real-world sensor datasets for various tasks. The results show that while LLMs exhibit considerable proficiency in simpler tasks, they face inherent challenges in processing compositional tasks with parameter selections compared to engineering experts. Additionally, we investigate four prompting strategies for sensor processing and show that self-verification can outperform all other baselines in 48% of tasks. Our study provides a comprehensive benchmark and prompting analysis for future developments, paving the way toward an LLM-based sensor processing copilot.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability in sensor data processing tasks

Comparing LLMs with experts on complex compositional tasks

Assessing prompting strategies for optimal sensor data handling

Innovation

Methods, ideas, or system contributions that make the work stand out.

SensorBench benchmark for LLM sensor processing

Self-verification prompting outperforms other strategies

LLMs show potential as sensor processing copilots

🔎 Similar Papers

No similar papers found.