Learn from A Rationalist: Distilling Intermediate Interpretable Rationales

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the limitations of small-scale neural networks in extracting interpretable rationales using only task labels, which suffer from an intractably large feature-combination search space and performance bottlenecks. To overcome this, the authors propose REKD, a novel approach that leverages intermediate rationales generated by a teacher model as supervision signals for knowledge distillation. Within a select-then-predict framework, the student model jointly learns to identify rationales and make predictions, enabling structured and verifiable knowledge transfer. The method is compatible with arbitrary black-box teacher models and various backbone architectures—including BERT and Vision Transformers—and demonstrates significant performance gains on IMDB, CIFAR-10, and CIFAR-100, confirming its effectiveness across both language and vision tasks.

Technology Category

Application Category

📝 Abstract

Because of the pervasive use of deep neural networks (DNNs), especially in high-stakes domains, the interpretability of DNNs has received increased attention. The general idea of rationale extraction (RE) is to provide an interpretable-by-design framework for DNNs via a select-predict architecture where two neural networks learn jointly to perform feature selection and prediction, respectively. Given only the remote supervision from the final task prediction, the process of learning to select subsets of features (or \emph{rationales}) requires searching in the space of all possible feature combinations, which is computationally challenging and even harder when the base neural networks are not sufficiently capable. To improve the predictive performance of RE models that are based on less capable or smaller neural networks (i.e., the students), we propose \textbf{REKD} (\textbf{R}ationale \textbf{E}xtraction with \textbf{K}nowledge \textbf{D}istillation) where a student RE model learns from the rationales and predictions of a teacher (i.e., a \emph{rationalist}) in addition to the student's own RE optimization. This structural adjustment to RE aligns well with how humans could learn effectively from interpretable and verifiable knowledge. Because of the neural-model agnostic nature of the method, any black-box neural network could be integrated as a backbone model. To demonstrate the viability of REKD, we conduct experiments with multiple variants of BERT and vision transformer (ViT) models. Our experiments across language and vision classification datasets (i.e., IMDB movie reviews, CIFAR 10 and CIFAR 100) show that REKD significantly improves the predictive performance of the student RE models.

Problem

Research questions and friction points this paper is trying to address.

interpretability

rationale extraction

deep neural networks

feature selection

knowledge distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rationale Extraction

Knowledge Distillation

Interpretable AI