DataMosaic: Explainable and Verifiable Multi-Modal Data Analytics through Extract-Reason-Verify

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the weak interpretability, poor verifiability, and hallucination susceptibility of large language models (LLMs) in multimodal data analysis, this paper proposes DataMosaic—a novel multi-agent framework. Methodologically, it introduces the first “Extract-Reason-Verify” paradigm: (1) dynamically extracting task-relevant structures (e.g., tables, charts, trees) from text, tabular, and image modalities to generate transparent, traceable reasoning chains; (2) orchestrating adaptive multi-agent collaboration to ensure analytical consistency, completeness, and privacy preservation; and (3) integrating cross-modal semantic alignment with verification-driven iterative refinement to overcome the untrustworthy reasoning limitations of RAG under noisy and heterogeneous data. Experimental results on real-world multimodal datasets demonstrate >92% verifiability of intermediate reasoning steps, with 100% traceability and auditability of the entire inference process.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are transforming data analytics, but their widespread adoption is hindered by two critical limitations: they are not explainable (opaque reasoning processes) and not verifiable (prone to hallucinations and unchecked errors). While retrieval-augmented generation (RAG) improves accuracy by grounding LLMs in external data, it fails to address the core challenges of trustworthy analytics - especially when processing noisy, inconsistent, or multi-modal data (for example, text, tables, images). We propose DataMosaic, a framework designed to make LLM-powered analytics both explainable and verifiable. By dynamically extracting task-specific structures (for example, tables, graphs, trees) from raw data, DataMosaic provides transparent, step-by-step reasoning traces and enables validation of intermediate results. Built on a multi-agent framework, DataMosaic orchestrates self-adaptive agents that align with downstream task requirements, enhancing consistency, completeness, and privacy. Through this approach, DataMosaic not only tackles the limitations of current LLM-powered analytics systems but also lays the groundwork for a new paradigm of grounded, accurate, and explainable multi-modal data analytics.

Problem

Research questions and friction points this paper is trying to address.

Addresses opaque reasoning in LLM analytics

Solves verifiability issues with noisy multi-modal data

Enhances explainability via structured extraction and validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic extraction of task-specific data structures

Multi-agent framework for adaptive analytics

Explainable and verifiable step-by-step reasoning

🔎 Similar Papers

No similar papers found.