MindMirror: A Local-First Multimodal State-Aware Support System for Digital Workers

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the challenges of fatigue, anxiety, and diminished attention commonly experienced by digital workers during prolonged computer use, for which existing tools offer limited real-time, lightweight psychological support. The authors propose a local-first, multimodal state-awareness and assistance system that integrates facial expression recognition (achieving 94.49% accuracy in seven-class classification), textual input, optional voice interaction, and a structured reflection protocol. Leveraging a locally deployed large language model, the system generates personalized recommendations and periodic state-review reports. Built with a web-based frontend and a Flask backend, it ensures full on-device data storage to preserve user privacy and enables manual correction of inferred states. Preliminary user studies indicate that this approach offers significant advantages in usability, controllability, and effectiveness.

📝 Abstract

Digital workers often experience fatigue, anxiety, reduced attention, and task blockage during prolonged computer-based work. Existing productivity tools mainly focus on task completion, while general-purpose AI chatbots require users to formulate clear prompts before receiving useful help. This paper presents MindMirror, a local-first multimodal state-aware support system for digital workers. MindMirror integrates camera-based facial expression cues, text input, optional speech interaction, structured blockage reflection, local large language model (LLM)-based response generation, and daily/weekly review reports. The system forms a closed workflow of state checking, manual correction, structured articulation, suggestion generation, and state review. The current prototype follows a local-first design, while optional speech services may rely on third-party APIs when enabled. It is implemented with a Web frontend, Flask backend, an emotion recognition model, an Ollama-hosted Qwen model, Chart.js visualization, and local JSON/LocalStorage records. We evaluate the emotion recognition module on an independent seven-class image-level facial expression benchmark containing 6,767 images. The fine-tuned Hugging Face model improves accuracy from 59.66% to 94.49% over a non-fine-tuned checkpoint baseline, an absolute gain of 34.83 percentage points. We further validate the prototype through endpoint-level reliability tests, voice-interaction latency tests, and a small formative user feedback study with six digital workers. Results suggest that users value the local-first design, manual correction mechanism, and structured reflection workflow. MindMirror is not intended for psychological diagnosis; instead, it serves as a lightweight, user-controllable tool for state reflection and supportive interaction.

Problem

Research questions and friction points this paper is trying to address.

digital workers

mental state awareness

task blockage

fatigue

attention decline

Innovation

Methods, ideas, or system contributions that make the work stand out.

local-first

multimodal state-aware

emotion recognition

structured reflection

on-device LLM

🔎 Similar Papers

Multimodal Machine Learning in Mental Health: A Survey of Data, Algorithms, and Challenges

2024-07-23arXiv.orgCitations: 1

EmBARDiment: an Embodied AI Agent for Productivity in XR

2024-08-15arXiv.orgCitations: 0