Interpreto: An Explainability Library for Transformers

📅 2025-12-10
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Existing interpretability tools for Hugging Face models predominantly focus on feature-level attribution, lack unified support for both classification and generation tasks, and struggle to balance semantic interpretability with usability. This paper introduces Interpreto—a novel open-source Python library that unifies attribution-based methods (e.g., gradient- and perturbation-based approaches) with concept-level explanations (e.g., Concept Activation Vectors and textual concept mining), establishing the first dual-paradigm, post-hoc explanation framework. Designed to be model-agnostic and architecture-agnostic, Interpreto provides a consistent API compatible with models ranging from BERT to large language models, enabling plug-and-play interpretability across both classification and generation tasks. Built upon PyTorch and the Hugging Face Transformers ecosystem, it includes comprehensive documentation, tutorials, and pip-installable packaging. Empirical evaluation demonstrates substantial improvements in semantic clarity and user-friendliness; Interpreto has been widely adopted in both industry and academia.

Technology Category

Application Category

📝 Abstract
Interpreto is a Python library for post-hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and concept-based explanations. The library connects recent research to practical tooling for data scientists, aiming to make explanations accessible to end users. It includes documentation, examples, and tutorials. Interpreto supports both classification and generation models through a unified API. A key differentiator is its concept-based functionality, which goes beyond feature-level attributions and is uncommon in existing libraries. The library is open source; install via pip install interpreto. Code and documentation are available at https://github.com/FOR-sight-ai/interpreto.
Problem

Research questions and friction points this paper is trying to address.

Provides explainability for Transformer text models
Offers attribution and concept-based explanation methods
Supports classification and generation models via unified API
Innovation

Methods, ideas, or system contributions that make the work stand out.

Python library for explainability of Transformer models
Provides attribution and concept-based explanation methods
Unified API supports classification and generation models
🔎 Similar Papers
No similar papers found.
Antonin Poché
Antonin Poché
Research engineer at IRT Saint Exupery
ExplainabilityInterpretabilityArtificial IntelligenceOpen Source
T
Thomas Mullor
IRT Saint Exupéry Toulouse
Gabriele Sarti
Gabriele Sarti
PhD Student, University of Groningen
natural language processinginterpretabilityhuman-computer interactiondeep learning
F
Frédéric Boisnard
Ampere
C
Corentin Friedrich
IRT Saint Exupéry Toulouse
C
Charlotte Claye
MICS, CentraleSupélec, Scienta Lab
F
François Hoofd
IRT Saint Exupéry Toulouse, Thales Avionics
R
Raphael Bernas
MICS, CentraleSupélec
C
Céline Hudelot
MICS, CentraleSupélec
Fanny Jourdan
Fanny Jourdan
Researcher at IRT Saint Exupéry
Natural Language ProcessingExplainabilityFairnessInterpretability