🤖 AI Summary
Existing AI assistants rely on explicit user queries and thus fail to proactively detect and rectify inefficient operations—such as redundant edits—in spreadsheet applications like Excel, resulting in delayed, generic, and low-actionability guidance.
Method: We propose the first end-to-end vision-grounded reflection system based on screen recordings, requiring no APIs, instrumentation logs, or explicit input. It leverages a vision-language model to parse interface states and reconstruct fine-grained user action sequences, followed by a large language model that generates structured, context-aware optimization suggestions.
Contribution: We introduce a novel two-stage behavioral inference pipeline enabling high-fidelity action reconstruction and context-sensitive recommendation generation. Empirical evaluation demonstrates that our system accurately identifies inefficiency patterns, yielding more personalized and executable workflow suggestions. It significantly outperforms conventional prompt-based assistants in both user learning efficiency and task completion quality.
📝 Abstract
Many users struggle to notice when a more efficient workflow exists in feature-rich tools like Excel. Existing AI assistants offer help only after users describe their goals or problems, which can be effortful and imprecise. We present InvisibleMentor, a system that turns screen recordings of task completion into vision-grounded reflections on tasks. It detects issues such as repetitive edits and recommends more efficient alternatives based on observed behavior. Unlike prior systems that rely on logs, APIs, or user prompts, InvisibleMentor operates directly on screen recordings. It uses a two-stage pipeline: a vision-language model reconstructs actions and context, and a language model generates structured, high-fidelity suggestions. In evaluation, InvisibleMentor accurately identified inefficient workflows, and participants found its suggestions more actionable, tailored, and more helpful for learning and improvement compared to a prompt-based spreadsheet assistant.