Do Proactive Agents Really Need an LLM to Decide When to Wake and What to Anchor?

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the high latency and computational redundancy incurred by conventional proactive agents that frequently invoke large language models (LLMs) to parse textualized user events. To overcome this inefficiency, the authors propose modeling user activities as structured temporal event streams and employing a lightweight Temporal Graph Learning (TGL) model to directly process native operating system graph data, invoking the LLM only upon trigger conditions to generate responses. By eliminating the inefficient “structure-to-text-and-back” pipeline, this approach achieves real-time trigger decision-making and entity routing via graph neural networks for the first time. Experiments demonstrate that the model improves average F1 scores by 16.7 (up to +46.0) across 14 baselines, achieves state-of-the-art trigger AUC with stable thresholds, and requires only 11.13 ms (GPU) or 13.99 ms (CPU) per event—yielding a 4–83× speedup over LLM-based methods—with a memory footprint of approximately 220 MiB, enabling on-device deployment.

📝 Abstract

Proactive agents read user activity as text and call an LLM on every event to decide whether to act. But user activity is not natively text: it is a structured event stream of (actor, verb, object, timestamp) tuples that the operating system already maintains in graph form. Rendering the structure as text and asking an LLM to recover it is a round-trip the system never had to take. We treat the always-on signal as graph updates rather than text and use a small temporal-graph-learning (TGL) model as the encoder: one forward pass yields a per-event trigger probability and a per-entity routing score, and only the downstream agent (turning a small structured handoff into a fluent user-facing sentence) is an LLM call, invoked only when the trigger fires. TGL improves F1 on each of 14 backbones (mean +16.7, up to +46.0); in trigger-architecture comparisons, one TGL checkpoint gives the strongest trigger AUCs and the most stable deployed threshold. It runs at 11.13 ms per event on a GPU server and 13.99 ms on a consumer laptop, approximately 4--7x and 12--83x faster than every single-forward LLM-as-trigger configuration tested in each regime, with an approximately 220 MiB BF16 resident footprint deployable on-device alongside the privacy-sensitive activity stream it consumes.

Problem

Research questions and friction points this paper is trying to address.

proactive agents

LLM

structured event stream

temporal graph learning

trigger decision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Graph Learning

Proactive Agents

Event-triggered LLM