Traceable Latent Variable Discovery Based on Multi-Agent Collaboration

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work proposes the first multi-agent framework that integrates large language models (LLMs) with causal discovery to address the limitations of traditional methods, which often assume no hidden variables and thus fail to uncover their semantics or traceability. By leveraging the metadata reasoning capabilities of LLMs, the framework infers the semantic meaning of latent variables, combines this with data-driven modeling via conventional causal algorithms, and characterizes strategic interactions among multiple data sources under incomplete information through Bayesian Nash equilibrium. This enables interpretable and verifiable tracing of hidden variables. Evaluated on five real-world and benchmark datasets, the approach substantially outperforms existing methods, achieving average improvements of 32.67% in accuracy (Acc), 62.21% in causal accuracy (CAcc), and 26.72% in evidence citation rate (ECit).

Technology Category

Application Category

📝 Abstract

Revealing the underlying causal mechanisms in the real world is crucial for scientific and technological progress. Despite notable advances in recent decades, the lack of high-quality data and the reliance of traditional causal discovery algorithms (TCDA) on the assumption of no latent confounders, as well as their tendency to overlook the precise semantics of latent variables, have long been major obstacles to the broader application of causal discovery. To address this issue, we propose a novel causal modeling framework, TLVD, which integrates the metadata-based reasoning capabilities of large language models (LLMs) with the data-driven modeling capabilities of TCDA for inferring latent variables and their semantics. Specifically, we first employ a data-driven approach to construct a causal graph that incorporates latent variables. Then, we employ multi-LLM collaboration for latent variable inference, modeling this process as a game with incomplete information and seeking its Bayesian Nash Equilibrium (BNE) to infer the possible specific latent variables. Finally, to validate the inferred latent variables across multiple real-world web-based data sources, we leverage LLMs for evidence exploration to ensure traceability. We comprehensively evaluate TLVD on three de-identified real patient datasets provided by a hospital and two benchmark datasets. Extensive experimental results confirm the effectiveness and reliability of TLVD, with average improvements of 32.67% in Acc, 62.21% in CAcc, and 26.72% in ECit across the five datasets.

Problem

Research questions and friction points this paper is trying to address.

causal discovery

latent confounders

latent variable semantics

data quality

causal mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent variable discovery

multi-agent collaboration

Bayesian Nash Equilibrium