CAN-QA: A Question-Answering Benchmark for Reasoning over In-Vehicle CAN Traffic

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing intrusion detection methods for in-vehicle CAN bus traffic often simplify the problem into a classification task, overlooking temporal dependencies and semantic relationships within the traffic, thereby hindering systematic forensic analysis. This work reframes CAN traffic analysis as a question-answering (QA) task, introducing a novel approach that leverages deterministic rule templates and temporal window segmentation to automatically generate natural language QA pairs from raw logs. The resulting dataset constitutes the first structured QA benchmark for automotive security, comprising 33,128 samples. Experimental results demonstrate that while large language models can capture superficial statistical patterns, they exhibit significant limitations in temporal reasoning, multi-condition judgment, and high-level behavioral understanding—highlighting the substantial challenge this task poses for complex semantic reasoning.

📝 Abstract

The Controller Area Network (CAN) is a safety-critical in-vehicle communication protocol that lacks built-in security mechanisms, making intrusion detection essential. Existing approaches predominantly formulate CAN intrusion detection as a classification task, mapping complex traffic patterns to attack labels. However, this formulation abstracts away the temporal and relational structure of CAN traffic and misaligns with real-world forensic workflows, which require systematic reasoning about traffic behavior. To address this gap, we introduce CAN-QA, the first benchmark that reformulates CAN traffic analysis as a question-answering (QA) task. CAN-QA converts raw CAN logs into temporally segmented windows and applies deterministic rule-based templates to generate natural-language questions paired with automatically derived ground-truth answers. The resulting dataset comprises 33,128 QA pairs across 10 categories, each targeting distinct semantic and temporal properties of CAN traffic. Using CAN-QA, we evaluate large language models across both True/False and multiple-choice formats. Our results indicate that, although these models capture superficial statistical regularities, they struggle with temporal reasoning, multi-condition inference, and higher-level behavioral interpretation. Our code is available at https://github.com/Kriiiiss/CAN-QA.

Problem

Research questions and friction points this paper is trying to address.

CAN intrusion detection

temporal reasoning

question-answering benchmark

in-vehicle network security

forensic analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

question-answering

CAN intrusion detection

temporal reasoning