π€ AI Summary
This work addresses the challenge of analyzing massive execution traces generated by large-scale systems such as operating system kernels, Chrome, and MySQL, which are difficult to interpret using existing tools that rely on predefined methods or error-prone, labor-intensive domain-specific scripts. The paper proposes TAAF, a novel framework that integrates temporal-indexed knowledge graphs with large language models (LLMs) to enable multi-hop and causal reasoning through a natural language question-answering interface, substantially reducing reliance on manual expertise. Evaluated on the authorsβ newly introduced TraceQA-100 benchmark, TAAF achieves up to a 31.2% improvement in answer accuracy over baseline methods, demonstrating particularly strong performance on complex reasoning tasks.
π Abstract
Execution traces are a critical source of information for understanding, debugging, and optimizing complex software systems. However, traces from OS kernels or large-scale applications like Chrome or MySQL are massive and difficult to analyze. Existing tools rely on predefined analyses, and custom insights often require writing domain-specific scripts, which is an error-prone and time-consuming task. This paper introduces TAAF (Trace Abstraction and Analysis Framework), a novel approach that combines time-indexing, knowledge graphs (KGs), and large language models (LLMs) to transform raw trace data into actionable insights. TAAF constructs a time-indexed KG from trace events to capture relationships among entities such as threads, CPUs, and system resources. An LLM then interprets query-specific subgraphs to answer natural-language questions, reducing the need for manual inspection and deep system expertise. To evaluate TAAF, we introduce TraceQA-100, a benchmark of 100 questions grounded in real kernel traces. Experiments across three LLMs and multiple temporal settings show that TAAF improves answer accuracy by up to 31.2%, particularly in multi-hop and causal reasoning tasks. We further analyze where graph-grounded reasoning helps and where limitations remain, offering a foundation for next-generation trace analysis tools.