Towards Practical Multi-label Causal Discovery in High-Dimensional Event Sequences via One-Shot Graph Aggregation

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficient discovery of multilabel causal relationships in high-dimensional sparse event sequences—such as clinical symptoms or vehicle diagnostic trouble codes—remains challenging due to prohibitive computational costs of exhaustive conditional independence testing. Method: We propose a two-stage causal discovery framework: (1) parallel inference of first-order causal graphs for individual sequences using a pretrained causal Transformer; and (2) adaptive frequency-based fusion to aggregate global Markov boundaries and reconstruct a structured, multilabel causal graph. Contribution/Results: Our approach circumvents the computational bottleneck of full-dataset conditional independence tests, enabling the first practical large-scale multilabel causal discovery. Evaluated on a real-world automotive fault dataset comprising 29,100 event types and 474 imbalanced labels, the method demonstrates superior scalability and robustness, significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract
Understanding causality in event sequences where outcome labels such as diseases or system failures arise from preceding events like symptoms or error codes is critical. Yet remains an unsolved challenge across domains like healthcare or vehicle diagnostics. We introduce CARGO, a scalable multi-label causal discovery method for sparse, high-dimensional event sequences comprising of thousands of unique event types. Using two pretrained causal Transformers as domain-specific foundation models for event sequences. CARGO infers in parallel, per sequence one-shot causal graphs and aggregates them using an adaptive frequency fusion to reconstruct the global Markov boundaries of labels. This two-stage approach enables efficient probabilistic reasoning at scale while bypassing the intractable cost of full-dataset conditional independence testing. Our results on a challenging real-world automotive fault prediction dataset with over 29,100 unique event types and 474 imbalanced labels demonstrate CARGO's ability to perform structured reasoning.
Problem

Research questions and friction points this paper is trying to address.

Understanding causality in event sequences with multiple outcome labels
Scalable causal discovery in high-dimensional sparse event sequences
Bypassing intractable cost of full-dataset conditional independence testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-shot causal graph inference per sequence
Adaptive frequency fusion for graph aggregation
Bypassing full-dataset conditional independence testing
🔎 Similar Papers
No similar papers found.