LogDB: Multivariate Log-based Failure Diagnosis for Distributed Databases (Extended from MultiLog)

📅 2025-05-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing log-based approaches for diagnosing failures in distributed databases suffer from insufficient system specificity and limited capability for distributed coordination. To address this, we propose the first log diagnosis framework explicitly designed for the internal architecture and multi-node distribution characteristics of distributed databases. Our method performs lightweight log feature extraction and compression at each node, followed by distributed feature aggregation at the coordinator node, and integrates multivariate time-series modeling for cluster-level anomaly detection. Unlike generic log analysis methods, our framework explicitly models inter-node dependencies and database-specific semantic features. Experiments on Apache IoTDB demonstrate high robustness across typical failures—including network partitioning, node crashes, and query blocking—as well as under diverse workloads, achieving an average F1-score of 92.7%, significantly outperforming baseline approaches.

Technology Category

Application Category

📝 Abstract
Distributed databases, as the core infrastructure software for internet applications, play a critical role in modern cloud services. However, existing distributed databases frequently experience system failures and performance degradation, often leading to significant economic losses. Log data, naturally generated within systems, can effectively reflect internal system states. In practice, operators often manually inspect logs to monitor system behavior and diagnose anomalies, a process that is labor-intensive and costly. Although various log-based failure diagnosis methods have been proposed, they are generally not tailored for database systems and fail to fully exploit the internal characteristics and distributed nature of these systems. To address this gap, we propose LogDB, a log-based failure diagnosis method specifically designed for distributed databases. LogDB extracts and compresses log features at each database node and then aggregates these features at the master node to diagnose cluster-wide anomalies. Experiments conducted on the open-source distributed database system Apache IoTDB demonstrate that LogDB achieves robust failure diagnosis performance across different workloads and a variety of anomaly types.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing failures in distributed databases using logs
Overcoming labor-intensive manual log inspection methods
Addressing limitations of existing log-based diagnosis approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

LogDB extracts and compresses log features
Aggregates features at master node
Tailored for distributed database systems