If Only My CGM Could Speak: A Privacy-Preserving Agent for Question Answering over Continuous Glucose Data

📅 2026-04-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the limitations of existing diabetes management platforms, which offer only static glucose summaries and lack support for natural language queries over continuous glucose monitoring (CGM) data. Direct use of large language models (LLMs) poses risks of privacy leakage and unreliable outputs. To overcome these challenges, the authors propose CGM-Agent, a novel framework that leverages an LLM as a reasoning controller to dynamically select and invoke local, deterministic analytical functions, ensuring sensitive data remains on-device. The study introduces the first benchmark dataset tailored for CGM question answering and demonstrates strong performance, achieving 94% accuracy on synthetic queries and 88% on real user queries. These results validate the feasibility of deploying lightweight, privacy-preserving models at the edge. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Continuous glucose monitors (CGMs) used in diabetes care collect rich personal health data that could improve day-to-day self-management. However, current patient platforms only offer static summaries which do not support inquisitive user queries. Large language models (LLMs) could enable free-form inquiries about continuous glucose data, but deploying them over sensitive health records raises privacy and accuracy concerns. In this paper, we present CGM-Agent, a privacy-preserving framework for question answering over personal glucose data. In our design, the LLM serves purely as a reasoning engine that selects analytical functions. All computation occurs locally, and personal health data never leaves the user's device. For evaluation, we construct a benchmark of 4,180 questions combining parameterized question templates with real user queries and ground truth derived from deterministic program execution. Evaluating 6 leading LLMs, we find that top models achieve 94\% value accuracy on synthetic queries and 88\% on ambiguous real-world queries. Errors stem primarily from intent and temporal ambiguity rather than computational failures. Additionally, lightweight models achieve competitive performance in our agent design, suggesting opportunities for low-cost deployment. We release our code and benchmark to support future work on trustworthy health agents.

Problem

Research questions and friction points this paper is trying to address.

continuous glucose monitoring

privacy-preserving

question answering

large language models

personal health data

Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy-preserving

continuous glucose monitoring

large language models