Explainable AI: Context-Aware Layer-Wise Integrated Gradients for Explaining Transformer Models

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing interpretability methods struggle to jointly model the interplay among cross-layer dependencies, contextual interactions, and structural components in Transformers and their collective influence on predictions. To address this, this work proposes CA-LIG, a novel framework that, for the first time, integrates class-oriented attention gradients with layer-wise integrated gradients within each Transformer layer to construct context-aware symbolic attribution graphs. These graphs reveal supporting and opposing evidence along with their cross-layer propagation pathways. CA-LIG establishes a unified, hierarchical, and context-sensitive attribution mechanism that simultaneously captures local token contributions, global attention patterns, and the evolution of cross-layer dependencies. Extensive experiments across multi-task, multilingual, and multimodal Transformer models demonstrate that CA-LIG consistently outperforms existing approaches, yielding explanations that are both more faithful and semantically coherent.

Technology Category

Application Category

📝 Abstract
Transformer models achieve state-of-the-art performance across domains and tasks, yet their deeply layered representations make their predictions difficult to interpret. Existing explainability methods rely on final-layer attributions, capture either local token-level attributions or global attention patterns without unification, and lack context-awareness of inter-token dependencies and structural components. They also fail to capture how relevance evolves across layers and how structural components shape decision-making. To address these limitations, we proposed the \textbf{Context-Aware Layer-wise Integrated Gradients (CA-LIG) Framework}, a unified hierarchical attribution framework that computes layer-wise Integrated Gradients within each Transformer block and fuses these token-level attributions with class-specific attention gradients. This integration yields signed, context-sensitive attribution maps that capture supportive and opposing evidence while tracing the hierarchical flow of relevance through the Transformer layers. We evaluate the CA-LIG Framework across diverse tasks, domains, and transformer model families, including sentiment analysis and long and multi-class document classification with BERT, hate speech detection in a low-resource language setting with XLM-R and AfroLM, and image classification with Masked Autoencoder vision Transformer model. Across all tasks and architectures, CA-LIG provides more faithful attributions, shows stronger sensitivity to contextual dependencies, and produces clearer, more semantically coherent visualizations than established explainability methods. These results indicate that CA-LIG provides a more comprehensive, context-aware, and reliable explanation of Transformer decision-making, advancing both the practical interpretability and conceptual understanding of deep neural models.
Problem

Research questions and friction points this paper is trying to address.

Explainable AI
Transformer models
context-awareness
attribution methods
interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-Aware Layer-wise Integrated Gradients
Transformer Explainability
Hierarchical Attribution
Integrated Gradients
Attention-Attribution Fusion