π€ AI Summary
Medical AI systems lack standardized logging mechanisms, impeding rigorous real-world performance evaluation, adverse event traceability, bias identification, and data drift monitoring. To address this, we propose MedLogβthe first event-level logging protocol specifically designed for clinical AI applications. MedLog defines nine core fields (e.g., model metadata, user context, input/output tensors, timestamps, confidence scores) and integrates risk-aware sampling, lifecycle-adapted storage, and write-after-caching to ensure compatibility with resource-constrained and heterogeneous clinical infrastructures. Implemented as a lightweight, interoperable middleware, MedLog enables fully auditable, end-to-end invocation tracing. It supports transparent regulatory oversight, continuous performance benchmarking, and dynamic risk surveillance. Empirical validation across diverse clinical AI deployments demonstrates negligible latency overhead (<2.3 ms per log entry) and seamless integration with existing EHR and AI orchestration systems. MedLog establishes foundational infrastructure for safe, scalable clinical AI deployment and digital epidemiology research.
π Abstract
Modern computer systems often rely on syslog, a simple, universal protocol that records every critical event across heterogeneous infrastructure. However, healthcare's rapidly growing clinical AI stack has no equivalent. As hospitals rush to pilot large language models and other AI-based clinical decision support tools, we still lack a standard way to record how, when, by whom, and for whom these AI models are used. Without that transparency and visibility, it is challenging to measure real-world performance and outcomes, detect adverse events, or correct bias or dataset drift. In the spirit of syslog, we introduce MedLog, a protocol for event-level logging of clinical AI. Any time an AI model is invoked to interact with a human, interface with another algorithm, or act independently, a MedLog record is created. This record consists of nine core fields: header, model, user, target, inputs, artifacts, outputs, outcomes, and feedback, providing a structured and consistent record of model activity. To encourage early adoption, especially in low-resource settings, and minimize the data footprint, MedLog supports risk-based sampling, lifecycle-aware retention policies, and write-behind caching; detailed traces for complex, agentic, or multi-stage workflows can also be captured under MedLog. MedLog can catalyze the development of new databases and software to store and analyze MedLog records. Realizing this vision would enable continuous surveillance, auditing, and iterative improvement of medical AI, laying the foundation for a new form of digital epidemiology.