🤖 AI Summary
Critical domains such as defense and intelligence lack rigorous, human-centered tools for designing and evaluating trustworthy AI systems.
Method: This study introduces PADTHAI-MM, a principle-based iterative design framework centered on the Multi-source AI Scoring Table (MAST), establishing a MAST-driven closed-loop design paradigm. It integrates NLP, data visualization, human-AI collaborative evaluation, and stakeholder participatory design to develop READIT, an intelligent report-assistant prototype.
Contribution/Results: The study empirically validates—for the first time—the statistically significant quantitative relationships between MAST’s three dimensions (process, purpose, performance) and human trust levels. It proposes a high-/low-MAST comparative validation methodology. Experimental results demonstrate that the high-MAST version of READIT significantly enhances user trust, confirming MAST’s effectiveness and cross-context applicability in trustworthy AI design and evaluation.
📝 Abstract
Despite an extensive body of literature on trust in technology, designing trustworthy AI systems for high-stakes decision domains remains a significant challenge, further compounded by the lack of actionable design and evaluation tools. The Multisource AI Scorecard Table (MAST) was designed to bridge this gap by offering a systematic, tradecraft-centered approach to evaluating AI-enabled decision support systems. Expanding on MAST, we introduce an iterative design framework called extit{Principles-based Approach for Designing Trustworthy, Human-centered AI using MAST Methodology} (PADTHAI-MM). We demonstrate this framework in our development of the Reporting Assistant for Defense and Intelligence Tasks (READIT), a research platform that leverages data visualizations and natural language processing-based text analysis, emulating an AI-enabled system supporting intelligence reporting work. To empirically assess the efficacy of MAST on trust in AI, we developed two distinct iterations of READIT for comparison: a High-MAST version, which incorporates AI contextual information and explanations, and a Low-MAST version, akin to a ``black box'' system. This iterative design process, guided by stakeholder feedback and contemporary AI architectures, culminated in a prototype that was evaluated through its use in an intelligence reporting task. We further discuss the potential benefits of employing the MAST-inspired design framework to address context-specific needs. We also explore the relationship between stakeholder evaluators' MAST ratings and three categories of information known to impact trust: extit{process}, extit{purpose}, and extit{performance}. Overall, our study supports the practical benefits and theoretical validity for PADTHAI-MM as a viable method for designing trustable, context-specific AI systems.